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ABSTRACT 

This  thesis  examines  the  classical  measure  of  correlation  (Pearson's  R) 
and  two  nonparametric  measures  of  correlation  (Spearman's  r  and  Kendall's 
t)  with  the  goal  of  determining  the  number  of  samples  needed  to  estimate  a 
correlation  coefficient  with  a  95%  confidence  level.  For  Pearson's  R,  tables, 
graphs,  and  computer  programs  are  developed  to  find  the  sample  number 
needed  for  a  desired  confidence  interval  size.  Nonparametric  measures  of 
correlation  (Spearman's  r  and  Kendall's  t)  are  also  examined  for  appropriate 
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I.     INTRODUCTION 

Everyone  wants  to  know  how  big  a  sample  is  needed.  In  many  forms  of 
weapon  system  testing,  there  is  always  a  decision  about  the  sample  size,  and 
this  decision  is  very  important  because  an  unnecessarily  large  sample  takes 
extra  time  and  increases  costs.  If  the  purpose  of  the  testing  is  to  estimate  a 
value,  then  the  test  needs  to  give  a  good  estimate  (represented  by  a  small 
confidence  interval).  At  the  same  time  it  is  desired  to  use  the  smallest  sample 
size  required  for  the  desired  accuracy.  The  topic  of  this  thesis  is  to  develop 
a  way  to  find  sample  sizes  when  the  testing  is  to  estimate  a  correlation 
coefficient. 

There  are  many  ways  to  find  a  sample  size.  In  this  thesis,  the  desired 
confidence  interval  size  will  be  used  as  the  basis  for  finding  sample  size.  It  is 
important  to  note  that  the  size  of  the  confidence  interval  depends  upon  the 
number  of  observations  which  are  taken,  and  in  general,  if  a  bigger  sample 
size,  is  used,  then  the  confidence  interval  will  be  smaller. 

The  problem  of  finding  the  sample  size  for  estimates  of  proportions,  given 
a  desired  confidence  interval  size,  has  been  studied  for  a  variety  of  cases  [Ref. 
1],  [Ref.  2]  and  [Ref.  3].  The  work  reported  here  looks  at  sampling  done  to 
estimate  a  correlation  coefficient,  and  the  sample  size  that  is  needed  to 
produce  a  desired  confidence  interval  for  that  correlation  coefficient.  This 
work  investigates  and  gives  some  opinion  about  the  necessary  sample  size 
that  would  be  used  when  estimation  involves  Pearson's  R,  and  also  discusses 
the  sample  size  problem  when  nonparametric  statistical  methods  are 
employed.  For  each  of  these  measures  the  relationship  between  sample  size 
and  confidence  interval  size  will  be  analyzed,  so  that  graphs  and  tables  can 
be  provided  to  assist  a  decision  maker  in  finding  the  necessary  sample  size 
to  obtain  a  desired  confidence  interval  to  estimate  a  correlation  coefficient 
value. 


In  Chapter  II  a  description  of  the  classical  sample  measure  of  correlation 
(Pearson's  R)  and  the  confidence  intervals  that  can  be  developed  using  the 
normal  approximation  method  will  be  provided.  The  third  chapter  addresses 
sample  size  determination  for  estimating  a  correlation  coefficient  using 
confidence  intervals.  This  chapter  will  discuss  how  computer  programs  were 
developed  and  used,  and  graphs  and  tables  were  constructed  to  determine  the 
required  sample  size  to  obtain  a  desired  95%  confidence  interval  for  a 
correlation  coefficient.  A  comparison  of  methods  is  done  to  give  easy  to  use 
results  about  sample  sizes  for  estimating  correlation  coefficient  values.  Then, 
in  Chapter  IV,  the  use  of  Spearman  and  Kendall  test  statistics,  and  the  problem 
of  finding  the  sample  size  that  is  needed  to  produce  a  confidence  interval  of 
desired  size  will  be  described.  Also,  in  this  chapter  a  comparison  will  be  done 
on  the  sample  size  results  that  are  needed  for  a  desired  confidence  interval 
size,  using  Pearson's  R,  Spearman's  rand  Kendall's  tau. 

The  final  chapter  will  summarize  this  research,  and  provide  some 
suggestions  for  further  research  and  study. 


II.     CORRELATION  AND  THE  PEARSON  PRODUCT-MOMENT  CORRELATION 

COEFFICIENT 

In  this  chapter  an  explanation  will  be  given  on  how  to  use  the  classical 
correlation  coefficient  method  for  a  desired  confidence  interval.  First,  the 
Pearson  product-moment  correlation  coefficient  will  be  studied.  Then  this 
information  will  be  used  to  show  how  estimates  of  the  population  correlation 
coefficient  may  be  obtained.  In  the  final  part  of  this  chapter,  different 
procedures  will  be  reviewed  to  find  a  confidence  interval  for  population 
correlation  coefficient  by  using  the  normal  approximation  method. 

A.     THE  PEARSON  PRODUCT-MOMENT  CORRELATION  COEFFICIENT 

Before  determining  any  sample  sizes,  a  brief  introduction  about  the 
Pearson  product-moment  correlation  coefficient  will  be  provided.  Gibbon 
states:  "In  general,  if  X  and  Y  are  two  random  variables  with  a  bivariate 
probability  distribution,  their  covariance,  in  a  certain  sense,  reflects  the 
direction  and  amount  of  correlation  or  correspondence  between  the  variables. 
The  covariance  is  large  and  positive  if  there  is  a  high  probability  that  large 
(small)  values  of  X  are  associated  with  large  (small)  values  of  Y.  On  the  other 
hand,  if  the  correspondence  is  inverse  so  that  large  (small)  values  of  X 
generally  occur  in  conjunction  with  small  (large)  values  of  Y,  their  covariance 
is  large  and  negative.  This  comparative  type  of  correlation  is  referred  to  as 
concordance  or  agreement.  The  covariance  parameter  as  a  measure  of 
correlation  is  difficult  to  interpret  because  its  value  depends  on  the  orders  of 
magnitude  and  units  of  the  random  variables  concerned.  A  nonabsolute  or 
relative  measure  of  correlation  circumvents  this  difficulty."   [Ref.  4:  p. 206] 

The  Pearson  product-moment  correlation  coefficient,  defined  as 

cov(X.Y) 

P(xy)  =  -=  (2.1) 

x  (Var(X)Var(Y)) 


[Ref.  4:  p. 206]  is  variant  under  changes  of  scale  and  location  in  X  and  Y.  and 
in  classical  statistics  this  parameter  is  usually  employed  as  the  measure  of 
correlation  in  a  bivariate  distribution.  The  absolute  value  of  the  correlation 
coefficient  does  not  exceed  1.  and  its  sign  is  determined  by  the  sign  of  the 
covariance.  If  X  and  Y  are  independent  random  variables,  then  their 
correlation  should  be  zero,  but  the  converse  is  not  true  in  general.  "If  the  main 
justification  for  the  use  of  p  as  a  measure  of  association  is  that  the  bivariate 
normal  is  such  an  important  distribution  in  classical  statistics  and  zero 
correlation  is  equivalent  to  independence  for  that  particular  population,  this 
reasoning  has  little  significance  in  nonparametric  statistics."  [Ref.  4:  p. 206] 

First  of  all.  a  measure  of  correlation  between  X  and  Y  must  satisfy  the 
following  requirements  in  order  to  be  a  good  relative  measure  of  association: 

•  The  measure  of  correlation  value  should  be  between  -1  and  +  1; 

•  If  the  larger  values  of  X  tend  to  be  paired  with  the  larger  values  of  Y,  and 
the  smaller  values  of  X  tend  to  be  paired  with  the  smaller  values  of  Y.  then 
the  measure  of  correlation  should  be  positive,  and  if  the  tendency  is 
strong  then  it  is  close  to  +  1 : 

•  If  the  larger  values  of  X  tend  to  be  paired  with  the  smaller  values  of  Y,  and 
vice  ver  .  then  the  measure  of  correlation  should  be  negative  and  if  the 
tendency  is  strong  then  it  is  close  to  -1: 

•  If  the  values  of  X  and  the  values  of  Y  are  randomly  paired,  then  the 
measure  of  correlation  should  be  fairly  close  to  zero.  It  means  that  X  and 
Y  are  independent. 

B.     ESTIMATION  OF  THE  POPULATION  CORRELATION  COEFFICIENT 

Most  of  the  time,  the  value  of  the  population  correlation  coefficient  (/<)  is 
unknown,  but  it  must  be  estimated  from  our  sample.  The  sample  correlation 
coefficient  is  a  random  variable  which  is  used  in  situations  where  the  data 
consist  of  pairs  of  numbers.  A  bivariate  random  sample  of  size  n  is 
represented  by  (x,.  y,).(x2.  y2) (x„.y„). 

Suppose  a  random  sample  of  n  pairs  (X,,  Y,),(X2,  Y2),  ...,(X„,  Yn)  is  drawn 
from  a  bivariate  population  with  Pearson  product-moment  correlation 
coefficient  p.  Then,  in  classical  statistics,  the  estimate  used  for  p  is  the  sample 
correlation  coefficient  R.  defined  as 


R  = 


(X,-X)(Yj-Y) 


/=1 


/=1 


(X,-X)2  )   (V/-Y)2 
/=o 


(2.2) 


[Ref.  5:  p. 244]  where  X  and  Y  are  the  sample  means 
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If  the  numerator  and  denominator  in  Equation  2.2  are  divided  by  n,  then  R 
becomes 
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(2.5) 


and  it  can  be  seen  in  Equation  2.5  that  the  numerator  is  the  sample  covariance 
and  the  denominator  is  the  product  of  the  two  sample  standard  deviations  (S). 
It  means  that  this  equation  is  similar  in  form  to  the  population  correlation 
coefficient  defined  in  Equation  2.2. 

This  sample  measure  of  correlation  may  be  used  on  a  set  of  data  without 
any    requirements,    but    it    is    difficult    to    interpret    unless    the    scale    of 


measurements  is  at  least  interval.  The  important  point  is  that  R  is  a  random 
variable  with  a  distribution  function,  and  the  distribution  function  of  R  depends 
on  the  bivariate  distribution  function  of  (X,Y). 

C.     CONFIDENCE  INTERVALS    FOR  THE  CORRELATION  COEFFICIENT. 

If  it  is  desired  to  determine  confidence  intervals  for  p  (population 
correlation  coefficient),  then  the  sampling  distribution  for  the  correlation 
coefficient  R  must  be  known.  If  (X.Y)  is  bivariate  normal,  then  the  expected 
value  and  variance  of  R  are  approximately 

E{R)^p,  (2.6) 

and 

VAR(R)^ provided  n  is  not  too  small  (2.7) 

[Ref.  6:  p. 462].  There  already  exist  confidence  intervals  for  confidence 
coefficients  of  95  percent.  These  were  determined  by  F.  N.  David  and  are 
reproduced  in  Figure  5  on  page  49  in  Appendix  A.  In  this  figure,  the  abscissa 
is  the  estimated  correlation  coefficient  from  the  sample  data.  For  each  given 
sample  size  and  value  of  R  there  is  a  confidence  interval  for  p,  varying  as  R 
goes  from  -1.0  to  +1.0.  For  example,  for  R  -  0.60.  n  =  5  the  95  percent 
confidence  interval  is  about  -0.5  <  p  <  0.91. 

If  a  figure  similar  to  that  of  Appendix  A  does  not  exist,  or  if  we  want  to  find 
the  exact  number  for  interval,  the  normal  distribution  can  be  used  to  obtain 
an  approximation. 

The  statistic  commonly  used  is 

Z  =  ^  \n(  y^f  )  =  tanhT1*.  (2.9) 

which  is  distributed  approximately  normal  with  an  expected  value 


and  variance 


<r?s(/?-3)    j 


(2.11) 


[Ref.  6:  p.4C3].  Note  here  that  Z  is  not  the  standard  normal  variable.  Using 
this  transformation,  the  confidence  interval  for  p  can  be  calculated.  Having 
calculated  the  estimate  for  p,  namely  R,  we  compute  Z  and  the  statistic 


K, 


1 


1+P 

1    -  P     ; 


n  -  3  ^ 


Z-E[Z) 
o(Z) 


(2.12) 


where  K,  approximately  follows  a  standard  normal  distribution. 

Using  the  normal  approximation,  there  will  be  95%  certainly  that 


Z  -  £(Z) 
1.96  < <  1.96 


o{2) 


(2.13) 


and  the  95%  confidence  interval  of  E(Z)  will  be 


1    i    /   1  +K 


1       /   1  +  P 
-1.96cx(Z)<£(Z)  =  -^ln    -7-3-7 


<l.n 


1  +R 
1  -  fl 


+  1.95a(Z; 


(2.14) 


From  2.10 


1 


"2"'n( 


1  +  R 


R 


\)       I    1  +  P 


(2.15a; 


and 


'l±^)<exp{2(±,n(l±|-)  +  ,9MZ,)J 


(2.15/3) 


If  the  left  side  of  2.15a  is  L,  and  the  right  side  of  2.15b  is  (7,.  then 


/  1  +  p 

Values  for  L,  and  U]  can  be  computed  from  sample  results.   From  2.16  the  95% 
confidence  interval  for  p  will  be 

/  l1  -- 1  \  /  u)-^  \ 

P<      -77— T        ■  (2-17) 


V  Li  + 1  /        V  ui + 1 

For  example,  if  the  data  has  10  observations  and  the  sample  correlation 
coefficient  R  =  0.60,  the  95%  confidence  interval  can  be  estimated.  Using  the 
confidence  belts  in  Figure  5  on  page  49  in  Appendix  A.  the  bounds  are  0.05 
and  0.39.  These  results  are  rough.  Using  Equations  2.9.  2.10,  and  2.11,  we 
have 

Z  =  ~lnf  44"  J  =  0.6932 

2       \  0.4  / 

and 

c(Z)  =  — ^=L=-  =  0.378   . 

%  n  -  3 

The  95  percent  confidence  limits  for  E(Z)  are  then 

0.6932  -  1.96  x  0.378  <  £(Z)  <  0.6932  +  1 .96  x  0.378 

which  reduce  to 

-0.047768  <E{Z)<  1.4341  . 

The  inequalities  can  be  written  as 

1       /  1  +  p  \ 
-  0.04768  <  -j-  In        _  <  1 .4341    , 

and  combining  Equation  2.15a  and  2.15b  to  obtain 


exp{2  x  (  -0.047768)}  <  I       _P    )  <  exp{2  x  (1.4341)}    , 

results  in  L,    =   0.90905.  and  U,   =    17.85.    Thus  from  Equation  2.17  the  95°o 
confidence  interval  for  p  is 

0.90905  -  1   \  /   17.85  -  1 


<P< 


.  0.90905  +  1    / 

which  reduces  to 

-  0.048  <  p  <  0.8925   . 

Confidence  interval  size  increases  as  the  sample  correlation  coefficient  R 
approaches  zero,  and  the  largest  confidence  interval  that  could  result  will 
occur  when  R  =  0.    Here 

,                (  3.92       | 

L1  -  exp- ==-   • 

I       xA7-3    J 
and 


//               (       3.92       I 
U)  =  exp- } 

I  Jn  -  3    J 


so  that  the  largest  confidence  interval  size  is 

f      3.92       }       ,                (           3.92       }       A 
exp-       >  -  1        exp- >  -  1 

I  Jn-3    J  I      v'n-3    J 

2A  =- 


f       3.92       ]       ,                 \  3.92       1       „ 

exp-  — =-  >  +  1        exp- ==-  >  +  1 

(xn-3    j  I       yjn-Z    J 

Results  for  this  case  are  shown  in  Table  1 .  The  table  provides  largest  possible 
confidence  interval  sizes  that  could  result  for  various  sample  sizes.  For 
example,  if  a  95%  confidence  interval  for  p  is  desired  which  is  no  greater  than 
0.2,  then  a  minimum  sample  of  size  367  would  guarantee  that  result. 


Table  1. 


REQUI 

R  =  0 


RED  SAMPLE 

00 


SIZE  FOR  A  95%  CONFIDENCE  INTERVAL  WHEN 


Estimated 

Correlation 

Coefficient 

Value 

Sample 

Size  = 

n 

Lower 

Confidence 

Limits 

Upper 

Confidence 

Limits 

95% 

Confidence 

Interval  Size 

=  2A 

0.00 

5.085 

-0.025 

0.025 

0.05 

0.00 

3.638 

-0.03 

0.03 

0.06 

0.00 

2.738 

-0.035 

0.035 

0.07 

0.00 

2.128 

-0.04 

0.04 

0.08 

0.00 

1.704 

-0.045 

0.045 

0.09 

0.00 

1.395 

-0.05 

0.05 

0.10 

0.00 

1.163 

-0.055 

0.055 

0.11 

0.00 

984 

-0.06 

0.06 

0.12 

0.00 

844 

-0.065 

0.065 

0.13 

0.00 

732 

-0.07 

0.07 

0.14 

0.00 

641 

-0.075 

0.075 

0.15 

0.00 

565 

-0.08 

0.08 

0.16 

0.00 

503 

-0.085 

0.085 

0.17 

0.00 

450 

-0.09 

0.09 

0.18 

0.00 

367 

-0.10 

0.10 

0.20 

0.00 

333 

-0.105 

0.105 

0.21 

0.00 

279 

-0.115 

0.115 

0.23 

0.00 

237 

-0.125 

0.125 

0.25 

0.00 

220 

-0.13 

0.13 

0.26 

0.00 

190 

-0.14 

0.14 

0.28 

0.00 

166 

-0.15 

0.15 

0.30 

This  chapter  explained  how  confidence  intervals  for  the  correlation 
coefficient  may  be  obtained.  The  next  chapter  will  present  methods  to 
determine  the  needed  sample  size  for  estimating  a  correlation  coefficient  by 
using  confidence  intervals. 
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III.     SAMPLE  SIZE  FOR  ESTIMATING  A  CORRELATION  COEFFICIENT  USING 

CONFIDENCE  INTERVALS 

In  Chapter  II  a  discussion  of  the  Pearson  product-moment  correlation 
coefficient,  estimation  of  the  population  correlation  coefficient,  and  confidence 
interval  for  population  correlation  coefficient  was  conducted.  This  chapter  will 
study  sample  size  determination  for  estimating  the  correlation  coefficient, 
using  confidence  intervals  and  the  normal  approximation  method  that  were 
explained  in  Chapter  II.  Then,  we  will  discuss  how  we  can  develop  and  use 
computer  programs,  graphs  and  the  tables  to  determine  the  required  sample 
size  to  obtain  a  desired  95%  confidence  interval  for  a  sample  correlation 
coefficient  value.  The  final  part  of  this  chapter  will  show  the  required  sample 
sizes  for  different  sample  correlation  coefficient  values. 

A.     SAMPLE  SIZE  DETERMINATION  USING  THE  NORMAL  APPROXIMATION 
METHOD  FOR  THE  ESTIMATED  CORRELATION  COEFFICIENT  VALUE 

Suppose  a  confidence  interval  of  size  2A  is  desired  for  the  correlation 
coefficient.    Then,  from  Equation  2.17. 

2/4  ==  Upper  Confidence  Limit  -  Lower  Confidence  Limit 

Ui-M    fh-±\  (3.D 


where 


.  y,  + 1  /    v  l,  + 1 


W    1   ,„/   1  +  ft  \   ,   inc,/7il 


and 


and  from  Equation  2.11 


Uy  =  exp   2^  -!■  ln(  +±±  )  +  ^  .96a(Z) )     ,  (3.2) 

L,  =  exp<2(  y  ln(  ±±!-L  )  -  1 .96u(Z)\|  .  (3.3) 


ll 


oil) 


n  -3 


(3.4; 


Thus,  95%  confidence  interval  size  (2A)  will  be  equal  to 


2A  = 


expJ2^ln 


exP{2(±,n 
exp-2(  -l|n 


exp-2/yln 


+  R 


R 


+ 


1.96 


n-3 


+  R 


-R 

+  R 


+ 


1.96 


R 


jn-3 

1.96 
n  -  3 


+  R 


R 


1.96 


•  +  1 


•-  1 


n-3 


=r      y  +  1 


J 


(3.5) 


If  Equation  3.5  could  be  solved  for  n  in  terms  of  2A.  there  would  exist  a  closed 
expression  by  which  the  needed  sample  size  could  be  computed.  However,  it 
is  very  hard  or  impossible  to  solve  Equation  3.5  for  n  in  terms  of  2A,  because 
of  the  complexity.  Although  a  closed  expression  for  n  could  not  be  obtained 
a  table  can  still  be  constructed  using  n  as  the  independent  variable,  and 
solving  for  2A.  Such  a  table  could  then  be  used  to  estimate  the  needed  sample 
size,  given  a  value  for  2A. 

However,  a  major  difficulty  still  remains.  From  the  form  of  I/,  and  L]  in 
Equation  3.2  and  3.3.  it  is  seen  that  in  subtracting  the  lower  confidence  limit 
from  the  upper  confidence  limit  to  obtain  2A.  the  sample  result  R  does  not 
vanish.  Therefore,  looking  at  Equation  3.5.  to  determine  the  required  sample 
size,  an  estimate  of  the  sample  correlation  coefficient  value  must  be  done. 

It  is  a  curious  result  to  see  that  in  order  to  determine  the  sample  size 
needed  to  estimate  a  correlation  coefficient  p  by  a  value  R.  a  first  guess  must 
be  made  at  the  result  R  of  a  sample  not  yet  taken.  However,  in  many  cases 
some  advance  knowledge  about  R  will  be  known,  e.g..  whether  it  is  positive 
or  negative,  or  whether  it  is  greater  or  less  than  0.5.    It  may  not  be  likely  that 
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R  is  to  be  very  high.  (say.  R  <  0.8).    In  any  event,  the  tables  will  show  that  n 
is  not  extremely  sensitive  to  the  guessed  value  of  R. 

For  example,  suppose  that  R  =  0.975  is  estimated  by  the  decision  maker, 
and  also  suppose  that  the  decision  maker  desires  the  confidence  interval  size 
to  be  0.10.  Then  the  decision  maker  can  find  n  =  10  from  Table  2.  It  is 
important  to  note  that,  when  R  =  -0.975,  the  confidence  interval  size  is  the 
same  with  R  =  0.975,  but  as  can  be  seen  from  Table  2  and  Table  3  on  page 
14,  they  have  different  upper  and  lower  bounds  because  of  the  sign.  For 
example,  the  values  of  the  lower  and  the  upper  bounds  will  be  0.89  and  0.99 
for  R  =  0.975  and  -0.99  and  -0.89  for  R  =  -0.975.  Because  of  this  symmetry 
negative  sample  correlation  coefficient  values  will  not  be  discussed  for  the  rest 
of  the  study.  Also,  since  our  purpose  is  finding  sample  size,  there  is  no 
interest  in  the  upper  and  the  lower  bound,  but  only  the  confidence  interval 
size. 


Table  2.       REQUIRED  SAMPLE  SIZE  FOR  A  95%  CONFIDENCE  INTERVAL  WHEN 
R  =  0.975 


Estimated 

Correlation 

Coefficient 

Value 

Sample 

Size  = 
n 

Lower 

Confidence 

Limits 

Upper 

Confidence 

Limits 

95% 

Confidence 

Interval  Size 

=  2A 

0.975 

20 

0.94 

0.99 

0.05 

0.975 

1G 

0.93 

0.99 

0.06 

0.975 

14 

0.92 

0.99 

0.07 

0.975 

12 

0.91 

0.99 

0.08 

0.975 

11 

0.90 

0.93 

0.09 

0.975 

10 

0.89 

0.99 

0.10 

0.975 

9 

0.88 

0.99 

0.11 

0.975 

8 

0.861 

0.99 

0.13 

0.975 

7 

0.841 

1.0 

0.16 

0.975 

6 

0.781 

1.0 

0.21 

0.975 

5 

0.661 

._ —  ... 

1.0 

0.34 
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Table  3.       REQUIRED  SAMPLE  SIZE  FOR  A  95%  CONFIDENCE  INTERVAL  WHEN 
R  =  -0.975 


Estimated 

Correlation 

Coefficient 

Value 

Sample 

Size  = 

n 

Lower 

Confidence 

Limits 

Upper 

Confidence 

Limits 

95% 

Confidence 

Interval  Size 

=  2A 

-0.975 

20 

-0.99 

-0.94 

0.05 

-0.975 

16 

-0.99 

-0.93 

0.06 

-0.975 

14 

-0.99 

-0.92 

0.07 

-0.975 

12 

-0.99 

-0.91 

0.08 

-0.975 

11 

-0.99 

-0.90 

0.09 

-0.975 

10 

-0.99 

-0.89 

0.10 

-0.975 

9 

-0.99 

-0.83 

0.11 

-0.975 

8 

-0.99 

-0.36 

0.13 

-0.975 

7 

-1.0 

-0.84                         0.16 

-0.975 

6 

-1.0 

-0.73                         0.21 

-0.975 

5 

-1.0 

-0.65 

0.34 

As  a  second  example,  suppose  the  estimate  of  the  sample  correlation 
coefficient  is  0.90  and  we  calculate  the  95%  confidence  interval  using  the 
normal  approximation  method.  For  a  given  sample  size,  this  will  yield  the 
confidence  limits.  Table  4  on  page  16  shows  the  number  of  samples  required 
to  obtain  different  95%  confidence  interval  sizes  for  various  values  of  2A  when 
R  =  0.90. 

An  APL  program,  named  "Tez"  was  written  to  obtain  the  sample  size,  the 
upper  and  the  lower  confidence  limits,  and  the  confidence  interval  size  (2A) 
after  inputting  any  estimated  value  for  sample  correlation  coefficient.  For  R 
=  0.90.  Table  4  on  page  16  was  constructed  by  executing  this  APL  program, 
and  the  program  was  used  to  create  similar  tables  in  this  chapter  and  in 
Appendix  B.  Table  5  on  page  17.  Table  6  on  page  18  and  Table  7  on  page 
19  show  the  required  sample  size  for  95%  confidence  intervals  using  R   = 
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0.80.  R  =  0.75  and  R  =  0.10.   Tables  for  the  other  values  of  R  are  in  Appendix 
C. 

It  is  important  to  note  that  when  R  =  ±  1,  the  Z  statistic  in  Equation  2.9 
goes  to  infinity.  Because  of  this  the  sample  size  cannot  be  calculated  for  the 
desired  confidence  interval  when  R  =  ±  1.  However,  an  quess  of  R  —  ±  1  will 
not  used. 
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Table  4.       REQUIRED  SAMPLE 
R  =  0.90 


SIZE  FOR  A  95%  CONFIDENCE  INTERVAL  WHEN 


Estimated 

Correlation 

Coefficient 

Value 

Sample 

Size  = 

n 

Lower 

Confidence 

Limits 

Upper 

Confidence 

Limits 

95% 

Confidence 

Interval  Size 

=  2A 

0.90 

195 

0.87 

0.92 

0.05 

0.90 

160 

0.87 

0.93 

0.06 

0.90 

120 

0.86 

0.93 

0.07 

0.90 

95 

0.85 

0.93 

0.08 

0.90 

75 

0.85 

0.94 

0.09 

0.90 

60 

0.84 

0.94 

0.10 

0.90 

50 

0.83 

0.94 

0.11 

0.90 

45 

0.82 

0.94 

0.12 

0.90 

40 

0.82 

0.95 

0.13 

0.90 

35 

0.81 

0.95 

0.14 

0.90 

30 

0.80 

0.95 

0.15 

0.90 

27 

0.79 

0.96 

0.1G 

0.90 

25 

0.78 

0.95 

0.17 

0.90 

23 

0.78 

0.96 

0.18 

0.90 

21 

0.77 

0.95 

0.19 

0.90 

20 

0.76 

0.96 

0.20 

0.90 

19 

0.75 

0.95 

0.21 

0.90 

17 

0.74 

0.95 

0.22 

0.90 

16 

0.73 

0.97 

0.24 

0.90 

15 

0.72 

0.97 

0.25 

0.90 

14 

0.71 

0.97 

0.26 

0.90 

13 

0.69 

0.97 

0.28 

0.90 

12 

0.67 

0.97 

0.30 
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Table  5. 


REQUI 
R  =  0 


RED  SAMPLE 
80 


SIZE  FOR  A  95%  CONFIDENCE  INTERVAL  WHEN 


Estimated 

Correlation 

Coefficient 

Value 

Sample 

Size  = 

n 

Lower 

Confidence 

Limits 

Upper 

Confidence 

Limits 

Confidence 

Interval  Size 

=  2A 

0.80 

664 

0.77 

0.82 

0.05 

0.80 

477 

0.77 

0.83 

0.06 

0.80 

360 

0.76 

0.83 

0.07 

0.80 

282 

0.75 

0.93 

0.08 

0.80 

226 

0.75 

0.84 

0.09 

0.80 

186 

0.74 

0.84 

0.10 

0.80 

156 

0.74 

0.85 

0.11 

0.80 

133 

0.73 

0.85 

0.12 

0.80 

115 

0.72 

0.85 

0.13 

0.80 

101 

0.72 

0.86 

C.14 

0.80 

89 

0.71 

0.86 

0.15 

0.80 

79 

0.71 

0.87 

0.16 

0.80 

71 

0.70 

0.87 

0.17 

0.80 

64 

0.69 

0.87 

0.18 

0.80 

58 

0.G8 

0.87 

0.19 

0  80 

53 

0.68 

0.83 

0.20 

0.80 

49 

0.67 

0.88 

0.21 

0.80 

45 

0.66 

0.88 

0.22 

0.80 

42 

0.66 

0.89 

0.23 

0.80 

39 

0.65 

0.89 

0.24 

0.80 

36 

0.64 

0.89 

0.25 

0.80 

34 

0.63 

0.89 

0.26 

0.80 

32 

0.63 

0.90 

0.27 

0.80 

30 

0.62 

0.90 

0.28 

0.80 

28 

0.61 

0.90 

0.29 

0.80 

27 

0.60 

0.90 

0.30 
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Table  6.       REQUIRED  SAMPLE 
R  =  0.75 


SIZE  FOR  A  95%  CONFIDENCE  INTERVAL  WHEN 


Estimated 

Correlation 

Coefficient 

Value 

Sample 

Size  = 

n 

Lower 

Confidence 

Limits 

Upper 

Confidence 

Limits 

95% 
Confidence 
Interval  Size 

=  2A 

0.75 

983 

0.72 

0.78 

0.05 

0.75 

707 

0.72 

0.78 

0.06 

0.75 

534 

0.71 

0.78 

0.07 

0.75 

412 

0.70 

0.78 

0.08 

0.75 

332 

0.70 

0.79 

0.09 

0.75 

272 

0.69 

0.79 

0.10 

0.75 

228 

0.69 

0.80 

0.11 

0.75 

193 

0.68 

0.80 

0.12 

0.75 

167 

0.68 

0.81 

0.13 

0.75 

145 

0.67 

0.81 

0.14 

0.75 

123 

0.66 

0.81 

0.15 

0.75 

113 

0.66 

0.82 

0.16 

0.75 

102 

0.65 

0.82 

0.17 

0.75 

92 

0.64 

0.83 

0.18 

0.75 

83 

0.64 

0.83 

0.19 

0.75 

75 

0.63 

0.33 

0.20 

0.75 

69 

0.62 

0.83 

0.21 

0.75 

63 

0.62 

0.84 

0.22 

0.75 

58 

0.61 

0.84 

0.23 

0.75 

54 

0.60 

0.84 

0.24 

0.75 

50 

0.60 

0.85 

0.25 

0.75 

47 

0.59 

0.85 

0.26 

0.75 

44 

0.58 

0.86 

0.27 

0.75 

41 

0.58 

0.86 

0.28 

0.75 

39 

0.57 

0.86 

0.29 

0.75 

37 

0.56 

0.86 

0.30 

Table  7. 


REQUI 
R  =  0 


RED  SAMPLE 
10 


SIZE  FOR  A  95%  CONFIDENCE  INTERVAL  WHEN 


Estimated 

Correlation 

Coefficient 

Value 

Sample 

Size  = 

n 

Lower 

Confidence 

Limits 

Upper 

Confidence 

Limits 

95°o 

Confidence 

Interval  Size 

=  2A 

0.10 

4998 

0.07 

0.13 

0.05 

0.10 

3582 

0.07 

0.13 

0.06 

0.10 

2682 

0.06 

0.14 

0.07 

0.10 

2089 

0.06 

0.14 

0.08 

0.10 

1677 

0.05 

0.14 

0.09 

0.10 

1369 

0.05 

0.15 

0.10 

0.10 

1140 

0.04 

0.15 

0.11 

0.10 

965 

0.04 

0.16 

0.12 

0.10 

828 

0.03 

0.17 

0.13 

0.10 

717 

003 

0.17 

0.14 

0.10 

629 

0.03 

0.18 

0.15 

0.10 

555 

0.02 

0.18 

0.16 

0.10 

495 

0.01 

0.18 

0.17 

0.10 

443 

0.01 

0.19 

0.18 

0.10 

393 

0.00 

0.19 

0.19 

0.10 

360 

0.00 

0.20 

0.20 

0.10 

327 

-0.01 

0.21 

0.21 

0.10 

299 

-0.01 

0.21 

0.22 

0.10 

274 

-0.02 

0.21 

0.23 

0.10 

253 

-0.02 

0.22 

0.24 

0.10 

234 

-0.03 

0.22 

0.25 

0.10 

215 

-0.03 

0.23 

0.26 

0.10 

200 

-0.04 

0.23 

0.27 

0.10 

186 

-0.04 

0.24 

0.28 

0.10 

174 

-0.05 

0.24 

0.29 

0.10 

163 

-0.05 

0.25 

0.30 
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Some  graphs  are  provided  which  show  the  difference  between  sample 
sizes  for  different  quesses  of  the  sample  correlation  coefficient  value.  These 
graphs  can  also  be  used  to  determine  the  appropriate  sample  size  for  a 
desired  confidence  interval.  Figure  1  on  page  21  shows  the  sample  size  and 
confidence  interval  for  R  =  0.95  and  R  =  0.90.  From  this  figure  it  is  obvious 
that  sample  size  increases  as  R  decreases.  Also  there  is  a  a  high  sensitivity 
in  sample  size  to  the  guess  of  R.  However,  when  our  guess  of  correlation  is 
smaller,  (say,  R  less  than  0.6)  then  n  will  not  be  as  sensitive.  In  reality,  we 
might  assume  R  is  not  likely  to  be  very  high,  (say,  R  <  0.8). 
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Figure  1.       REQUIRED  SAMPLE  SIZE  FOR  A  95%  CONFIDENCE  INTERVAL  WHEN 
R  =  0.95  AND  R  =  0.90 

Figure  2  on  page  22  shows  the  sample  size  and  the  confidence  interval  for 
R  =  0.65  and  R  =  0.45.  and  Figure  3  on  page  23  shows  the  sample  size  and 
the  confidence  interval  size  for  R  =  0.55  and  R  =  0.35.  From  these  two 
figures  we  can  find  the  required  sample  sizes  approximately,  and  we  see  that 
n  is  not  too  sensitive  to  the  guessed  value  of  R.  The  graphs  of  different  sample 
correlation  coefficient  values  are  helpful  in  presenting  the  sensitivity 
differences  in  the  sample  size.  Other  graphs  with  different  sample  correlation 
coefficient  values  are  in  Appendix  D. 
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Figure  2.       REQUIRED  SAMPLE  SIZE  FOR  A  95%  CONFIDENCE  INTERVAL  WHEN 
R  =   0.65  AND  R  =   0.45 
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Figure  3.       REQUIRED  SAMPLE  SIZE  FOR  A  95%  CONFIDENCE  INTERVAL  WHEN 
R  =  0.55  AND  R  =  0.35 


B.     COMPARISON  OF  SAMPLE  SIZES  FOR  DIFFERENT  CORRELATION 
COEFFICIENT  VALUES 

A  comparison  of  the  results  for  different  correlation  coefficient  values 
shows  that  as  the  correlation  coefficient  gets  larger  in  absolute  value,  then  the 
required  sample  size  gets  smaller  for  a  desired  confidence  interval  size. 
Table  8  on  page  25  shows  the  results  obtained  from  the  computer  program  for 
different  combinations  of  sample  correlation  coefficient  estimates  and 
confidence  interval  sizes.  For  example,  if  a  confidence  interval  size  2A  of  0.15 
is  desired  then  the  required  sample  size  is  20  for  R  =  0.925,  30  for  R  =  0.90, 
171  for  R  =  0.70,  363  for  R  =  0.5  and  641  for  R  =  0.0.    Further,  a  confidence 
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interval  of  size  0.2  and  a  prior  estimate  of  R  =  0.6  could  reduce  the  sample 
size  to  n  =  153,  which  is  less  than  half  the  n  =  367  observations  that  would 
be  required  under  total  uncertainty  about  R  (estimate  R  =  0.0). 

How  can  this  table  be  used  when  the  sample  correlation  is  not  yet  known?. 
The  table  can  provide  general  guidance  to  relieve  some  of  the  mystery  in 
choosing  the  size  of  a  sample.  For  example,  if  a  maximum  confidence  interval 
of  size  0.2  is  desired  and  the  variables  are  assumed  to  be  highly  correlated, 
a  sample  of  50  should  work;  while  if  the  correlation  is  assumed  small,  then 
several  hundred  observations  will  be  needed. 
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Table  8.   REQUIRED  SAMPLE  SIZE  FOR  A  95% 

CONFIDENCE  INTERVAL 

95% 
Confidence 
Interval  Size 

=  2A 

Estimated  sample  correlation  coefficient  value 

0.0 

0.1 

0.5 

0.6 

0.7 

0.8 

0.9 

0.925 

0.05 

5.085 

4.998 

2.860 

2.082 

1.326 

664 

195 

113 

0.06 

3.638 

3.582 

2.049 

1.495 

945 

477 

160 

83 

0.07 

2.738 

2.682 

1.542 

1.123 

715 

360 

120 

64 

0.08 

2.128 

2.089 

1.199 

875 

558 

282 

95 

52 

0.09 

1.704 

1,677 

961 

701 

448 

226 

75 

43 

0.10 

1.395 

1.369 

787 

575 

367 

186 

60 

36 

0.11 

1.163 

1.140 

657 

480 

307 

156 

50 

31 

0.12 

984 

965 

556 

407 

260 

133 

45 

28 

0.13 

844 

828 

477 

349 

224 

115 

40 

25 

0.14 

732 

717 

414 

303 

195 

101 

35 

22 

0.15 

641 

629 

363 

266 

171 

89 

30 

20 

0.16 

565 

555 

320 

235 

151 

790 

27 

19 

0.17 

503 

495 

285 

209 

135 

71 

25 

17 

0.18 

450 

443 

255 

190 

121 

64 

23 

15 

0.19 

405 

398 

230 

169 

1  10 

58 

21 

15 

0.20 

367 

360 

209 

153 

100 

53 

20 

14 

0.21 

333 

327 

190 

140 

91 

49 

19 

13 

0.22 

304 

299 

174 

128 

84 

45 

17 

13 

0.23 

279 

274 

159 

118 

77 

42 

17 

12 

0.24 

257 

253 

147 

109 

71 

39 

16 

12 

0.25 

237 

234 

136 

100 

66 

36 

15 

11 

0.26 

220 

215 

126 

93 

62 

34 

14 

11 

0.27 

204 

200 

117 

87 

57 

32 

14 

11 

0.23 

190 

186 

109 

81 

54 

30 

13 

10 

0.29 

178 

174 

102 

76 

51 

28 

13 

10 

0.30 

166 

163 

96 

71 

43 

27 

12 

9 
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This  chapter  has  discussed  the  problem  of  determining  a  sample  size  to 
estimate  a  correlation  coefficient  for  a  desired  confidence  interval,  when 
Pearson's  R  is  used.  The  next  chapter  will  present  two  nonparametric 
measures  of  correlation  (Spearman's  and  Kendall's  test  statistics)  and  explore 
the  problem  of  finding  a  sample  size  for  these  cases. 
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IV.      NONPARAMETRIC  MEASURES  OF  CORRELATION,  AND  SAMPLE  SIZE 

In  the  late  1930's  a  difterent  approach  to  the  problem  of  finding 
probabilities  began  to  gather  some  momentum.  This  new  package  of  statistical 
procedures  became  known  as  "nonparametric  statistics,"  and  the  methods 
often  involve  less  computational  work,  and  therefore  are  often  easier  and 
quicker  to  apply  than  other  statistical  methods.    [Ref.  5:  p. 3] 

Included  among  methods  described  as  "nonparametric"  are  procedures 
providing  a  measure  of  correlation  when  the  bivariate  data  (X,  Y)  are  on  strict 
ordinal  scales.    An  example  of  such  data  for  sample  size  five  could  be 


X 

Y 

2 

4 

3 

2 

5 

3 

1 

1 

4 

5. 

where  X,  <  X2  would  imply  that  X2  possesses  more  of  the  property  being 
measured  than  X,,  and  V,  <  Y2  would  imply  that  Y2  possesses  more  of  the 
property  than  V,.  Ordinal  data  can  arise  directly  from  the  measuring 
procedure  in  the  experiment,  or  can  be  obtained  from  interval  or  ratio  scaled 
data.  An  example  of  the  latter  would  be  bivariate  data  of  the  temperatures  in 
centigrade  in  Istanbul  and  Izmir  for  five  days: 

X  Y 

23  27 

25  22 

30  24 

22  20 

27  31. 

When  reduced  from  an  interval  to  an  ordinal  scale,  this  data  would  be  as  in  the 
example  shown  above. 
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In  the  previous  chapter  we  discussed  sample  size  determination  for 
estimating  a  correlation  coefficient  using  confidence  intervals  and  compared 
the  sample  sizes  for  different  sample  correlation  coefficient  values.  In  general, 
the  sampling  distributions  of  R  depends  upon  the  form  of  the  bivariate 
population  from  which  the  sample  of  pairs  is  drawn.  More  importantly, 
Pearson's  R  as  a  correlation  measure  requires  that  data  be  on  an  interval  or 
ratio  scale. 

Here  we  will  discuss  the  Spearman  and  Kendall  measures  of  correlation. 
First,  some  of  the  theory  and  examples  of  Spearman's  measure  of  correlation 
will  be  provided.  Then  a  discussion  will  be  conducted  in  the  use  of  the  normal 
approximation  method  with  Spearman's  r,  and  how  confidence  intervals  can 
be  constructed.  The  next  part  of  this  chapter  will  summarize  the  theory  and 
give  examples  of  Kendall's  measure  of  correlation.  Likewise,  the  use  of  the 
normal  approximation  method  to  find  confidence  intervals  with  Kendall's  ts 
will  be  presented.  The  final  part  of  this  chapter,  will  look  at  the  results  that  can 
be  obtained  from  Pearson's  R.  Spearman's  r.  and  Kendall's  ts.  and  compare 
the  sample  sizes  obtained  from  these  three  methods. 

A.     SPEARMAN'S  R 

For  this  thesis,  we  let  "r"  be  the  notation  for  Spearman's  coefficient  of  rank 
correlation.  It  is  usually  designated  by  p  but.  the  use  of  p  will  cause  some 
confusion  between  population  correlation  coefficient  and  this  rho. 

In  general,  the  sampling  distribution  of  R  depends  upon  the  bivariate 
population  from  which  the  sample  of  pairs  is  drawn.    But,  suppose  that  the  X 

observations  are  ranked  from  smallest  to  largest  using  the  integers  1.2.3 n, 

and  the  Y  observations  are  ranked  the  same  way.  In  other  words,  each 
observation  is  assigned  a  rank  according  to  its  magnitude  relative  to  the 
others  in  its  own  group.  Then,  the  data  consists  of  n  sets  of  paired  ranks,  and 
using  these  pairs.  R  as  defined  in  Equation  2.5  can  be  calculated.  The 
resulting  statistic  is  called  Spearman's  coefficient  of  rank  correlation  (r).  The 
difference  between  Pearson's  R  and  Spearman's  r  is.  Spearman's  r  measures 
the  degree  of  correspondence  between  rankings,  instead   of  actual  variate 
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values.    However,  it  can  still  be  considered  a  measure  of  correlation  between 
X  and  Y  in  the  continuous  bivariate  population.    Let 

R{Xi)  =  rank{Xj), 


and 


R{Yj)  =  rank(Yj). 


Spearman's  coefficient  of  rank  correlation  is 


/ 1 

12  N 


>  [R(Xj)  -  R(Xn[R(Yj)  -  R(Yft 


r  = 


/=1 


n{n2-  1) 


(4.1) 


and  if  the  data  are  replaced  by  their  ranks,  then  X  and  Vcorresponds  to  R(X) 
and  R{Y),  and  can  be  calculated  as 


R{X) 


1    n(n  +1 
~n"       ? 


RIXA  =-!->/' 


n 

n  y 


)       n  +  1 


(4.2) 


and  in  the  same  way 


tf(Y) 


n  +  1 


(4.3J 


Then.  Spearman's  coefficient  of  rank  correlation  becomes 


12 


R(X,) 


n  +  1 


R(Yi) 


n  +  1 


/=1 


n{n'-l) 


(4.4) 
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[Ref.  5:  p. 246].   An  equivalent  but  computationaly  easier  form  is  given  by 


1  -6  >   [RiXj)  -  R[Yj)]' 

/=j 

n(n2-1) 


(4.5) 


and  if  we  take  7  =  £[R(X,)  -  R{Y,)]2  then,  Equation  4.5  will  be, 


»=i 


r=  1 


67 


n{n'  -1) 


(4.61 


It  is  important  to  note  that  Equation  4.5  and  4.6  are  equivalent  to  4.4  only  if 
there  are  no  ties. 

If  a  small  number  of  ties  are  present  in  the  data.  Equation  4.5  and  4.6  can 
be  used  because  of  the  simplicity  and  there  will  be  very  little  difference 
between  the  two  coefficients  obtained  from  2.5  and  4.5.  If  there  are  many  ties, 
then  Pearson's  R  in  Equation  2.5  should  be  used  on  the  ranks  as  described 


below.    In  this  manner.  X  corresponds  to  R(X)  and  R{Y)  as  explained  before. 


Vi 


1=1 


and,  2JX,  -  X):  and  yJY,  -  Y)2  corresponds  to 


V 

/ 


n 

[fi(X/)-fiwf-y(/-i±±) 


/=1 


(4.7) 


/=1 


,2_/(n+1)  +  (JL±l) 


n(n  +  1)(1n  +  1)        nin  +  lf        n(n  +  1V 


In  the  same  way 


n(n    -1) 
12 
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— no       nin2-^) 
R{Y,)  -  R(Y)f  = - .  (4.8) 

/=1 

Thus  Equation  2.5  becomes  Equation  4.4,  and  this  means  that  Pearson's  R 
reduces  to  Spearman's  r  when  the  data  are  replaced  by  their  ranks. 

The  following  is  an  example  to  see  the  difference  between  Pearson's  R  and 
Spearman's  r.  Let's  take  12  paired  data  like  (86.88),  (71,77),  (77.76).  (68.64). 
(91.96),  (72,72).  (77.65),  (91,90),  (70,65),  (71,80),  (88,81),  (87,72).  Suppose  these 
are  the  math  and  the  english  scores  of  12  students.  The  math  scores  of  the 
students  were  ranked  among  themselves, 

X,  =  68  70  71  71  72  77  77  86  87  88  91  91, 

and  the  english  scores  of  the  students  were  ranked  among  themselves, 

Y,  =  64  65  65  72  72  76  77  80  81  88  90  96. 

There  are  3  pairs  of  ties  in  X  variables  and  2  pairs  of  tie  in  Y  variable.    The 

pairs  of  ties  will  be  given  the  average  ranks  for  each  pair.    For  example  the 

3-^-4 
first  ties  are  when  the  X  variable  is  / 1 :  thus  the  rank  will  be  — - —  =  3.5.    The 

other  pairs  of  ties  were  similarly  ranked  and  the  general  result  is, 
R(X,)  =  8,   3.5.  6.5.    1,    11.5,  5,  6.5,    11.5,  2,   3.5.    10,  9 

and 

R(Y|)=^0,  7.  6,    1,    12.  4.5.  2.5,    11,  2.5,  8,  9,  4.5. 

By  using  these  values  we  can  calculate 

[R{Xj)  -  R(Yi)f  =  4,   12.25.  0.25,  0.  0.25.  0.25,   16.  0.25,  0.25,  20.25.   1.  20.25 

and  then,  calculate  the  statistic  T  in  Equation  4.6  as 
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12 

r=   )   IR{X,)  -  R(Yff]2  =  75. 


;=1 


Then  r  is  obtained  from  Equation  4.6  as 

67  6(75) 

r  =  1 ^ =  1 =  0.7378. 

n(n2-1)  12(143) 

Using  Equation  4.4  to  calculate  the  r  value,  results  in  r  =  0.729,  and  using 
Equations  2.13.  2.14.  to  calculate  the  Pearson's  R  on  the  ranks  gives  R  = 
0.7354.  As  can  be  seen,  there  is  a  very  small  differences  between  these 
values. 

B.     CONFIDENCE  INTERVALS  FOR  CORRELATION  COEFFICIENT  WHEN  WE 
USE  SPEARMAN'S  R 

If  X  and  Y  are  independent  and  continuous  then  the  population  correlation 
coefficient  will  be  equal  to  zero,  and  if  this  happens  then  the  expected  value 
of  the  sample  correlation  coefficient  will  essentially  be  zero  too,  because 
E [R]  =  p.  The  variance  of  the  sample  correlation  coefficient  will  be  equal  to 
— -.  and  from  Equation  2.7  it  is  very  clear  that  as  a  sample  size  gets  bigger  then 
variance  of  the  sample  correlation  coefficient  will  approach  zero. 

To  find  a  confidence  interval  for  the  population  correlation  coefficient  by 
using  Spearman's  r.  the  statistic  will  be 

Z  =  ^ln(-^)=tanrTV,  (4.9) 

which  is  distributed  approximately  normally  with  expected  value 

1       /  1  +  P  \ 

and  variance 
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a\^{n-Z)   1 


(4.11 


[Ref.  6:  p. 463]. 

Using  this  transformation,  the  confidence  interval  for  p  can  be  found. 
Having  calculated  the  estimate  for  p,  namely  r,  we  can  compute  Z  and  the 
statistic 


*2  = 


1    .    (   1  +  P 


n  -  3  s 


Z  -  EiZ) 
o(Z) 


(4.12) 


which  is  approximately  normally  distributed  with  expected  value  equal  to  0.0 
and  variance  equal  to  1. 

Using  the  normal  approximation,  there  is  95%  certainty  that 


Z-E(Z) 
-1.96  < <  1.96. 


(4.13] 


and  the  95%  confidence  interval  of  E(Z)  will  be 


1  .    /    1  +  r 

2  V   1  -  r 


1       /  1  +  p  ' 
-  1.96(7  <£(Z)  =  ^-ln( — i- 


^  1  ,   /  1 +r 

<-  n    — 

2      V   1  -  r 


-f  1.96(7. 


(4.14) 


Equation  4.10  may  be  used  to  obtain 


\J  1   ,    ( l±r 

exp   2    —  n 

w[\  2      \  1  -  r 


1 .96a  |  >  < 


1    .  /    1  +  P 


(4.15a) 


and 


1  +ft  \ ^         f J  1    .    /   1  +r 

<  exp- 2    —  n 

1  -  p  )  y{\2      V  1  -  r 


1.96a  H 


(4.15b; 


If  the  left  side  of  4.15a  is  L2  and  the  right  side  of  4.15b  is  U2  then 
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1  +p  \ 
\-^zj)<u2,  (4.16) 


and  from  this  equation  the  95%  confidence  interval  for  p  will  be 


L2  -  1   \  /  Uo  - 


\l2  +  ij<"<[T^T)-  ,4-17» 

Spearman's  r  can  be  used  to  find  a  confidence  interval  for  a  population 
correlation  coefficient,  by  using  the  normal  approximation  method.  It  is  very 
important  to  note  that  when  using  this  approximation  the  observations  (X.  Y) 
are  independent.  If  these  bivariate  observations  are  independent  then  the 
measures  of  correlation  values  (Pearson's  R  and  Spearman's  r)  will  almost  be 
equal.  Thus,  both  of  these  methods  can  be  used  to  find  a  confidence  interval. 
If  the  observations  are  not  independent  then  Spearman's  r  cannot  be  used  in 
place  of  Pearson's  R.  Again,  the  largest  sample  size  for  a  desired  interval  size 
that  could  occur  will  occur  when  r  =  0.  and  we  call  this  the  worst  case. 

C.      KENDALL'S  TAU 

Another  measure  of  correlation  is  Kendall's  (ts).  which  is  usually 
considered  more  difficult  to  obtain  than  Spearman's  r.  The  basic  advantage 
of  Kendall's  ts  is  that  its  distribution  approaches  the  normal  distribution  quite 
rapidly,  so  that  the  normal  approximation  is  better  for  Kendall's  zc_  than  it  is  for 
Spearman's  r.  Another  advantage  of  the  Kendall  test  statistic  is  its  direct  and 
simple  interpretation  in  terms  of  probabilities  of  observing  concordant  and 
discordant  pairs.    [Ref.  5:  p. 356] 

For  any  two  independent  pairs  of  random  variables  (X,.  V,)  and  (Xr  Yj,  we 
denote  by  pc  and  pd  the  probabilities  of  concordance  and  discordance.  Two 
observations,  for  example  (2.3,  3.5)  and  (2.6,  1.7),  are  called  concordant  if  both 
members  of  one  observations  are  larger  than  their  respective  members  of  the 
other  observation,  and  are  called  discordant  otherwise.  The  probabilities  pc 
and  pd  can  be  defined  as 
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pc  =  P{[(Xj  <  Xj)  n  (Y,  <  Yj)-]  IJ  [(X,  >  Xj)  n  ( Yi  >  Y))]} 

=  Pl{Xj-Xj){Yj-  V;)>0]  (4.18! 

=  P[(X,  <  Xj)  n  [Yi  <  Yj)-]  +  P[(X,  >  Xj)  n  (V,  >  Yj)l 


and 


Prf-PKXy-X/KVyT-Y^O] 

=  P[(X;  <  Xy)  PI  (V/  >  Yj)"]  +  P[(X,-  >  Xj)  fl  (V/  <   V))] 


(4.19) 


[Ref.  4:  p.208]. 

If  there  is  a  perfect  correlation  between  X  and  Y.  then  there  is  either  perfect 
concordance  or  perfect  discordance.  The  Kendall  coefficient  x  is  defined  as 
the  difference 

T  =  pc-pd.  (4.20) 

If  the  marginal  probability  distributions  of  X  and  Y  are  continuous,  so  that  the 
possibility  of  ties  X,  =  X,  or  Y,  —  YJ  within  groups  is  eliminated,  we  have 


pc  =  {P(Y,  <  Yj)  -  P[(X,  >  Xy)  fl  {Yi  <  Yfl 

+  {P(Y/>Yy)-P[(X/<Xy)n(V/>Yy)] 


(4.21) 


Thus. 


pc  =  P{Yi<Yl)+P{Yl>Y])-Pd 
=  1  -  Pd- 

In  this  case,  r  can  be  expressed  as 

r  =  2pc-1  =  1-2pd  (4.22) 

[Ref.  4:  p.208]. 

If  X  and  Y  are  independent  and  continuous  random  variable  then  pc  must 
be  equal  to  pd.  and  so  we  find  t  =  0.    This  means  that  for  independent  and 
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continuous  random  variables  t,  will  be  equal  to  zero.    In  general,  the  converse 

is  not  true.    [Ref  4:  p.208] 

All  this  explanation  is  about  the  population.    However  we  are  interested  in 

the  sample.     If  there  are  n  observations  then  it  means  these  n  observations 

in  v       n{n  -  1) 
may  be  paired  in  = different  ways.     Suppose  we  compare  all 

pairs  and   determine  the   number  of  concordant   pairs  and   the   number  of 

discordant   pairs.      Let  c,   be  the   number  of  concordant   pairs.        Then,  an 

unbiased  estimate  of  pc  will  be 


(4.23) 


Now  let  d,  be  the  number  of  discordant  pairs  and  then 


Y 


Pd  =    /   — 77  (4.24) 

j  n(n  —  1) 


/=i 


will  give  an  unbiased  estimate  of  pa.    A  measure  of  correlation  of  the  sample 
will  be 


ls 


[Pc-Pd)  (4.25) 


[Ref.  4:  p. 210],  This  is  Kendall's  sample  tau  coefficient  ts,  which  is  an  unbiased 
estimater  of  the  parameter  t  in  any  bivariate  distribution.  "It  is  important  to 
note  that  the  variance  of  rs  approaches  zero  as  the  sample  size  approaches 
infinity."  [Ref.  4:  p.211] 

Using  the  same  data  that  we  used  in  the  Spearman  example  to  calculate 

r,  Kendall's  ts  will  be  calculated.   Arrangement  of  the  data  (X,,  Y,)  according  to 

increasing  values  of  X  gives  these  pairs  of  observation:  (68,  64).  (70,  65),  (71, 

77).  (71.  80),  (72,  72).  (77,  65).  (77.  76).  (86.  88).  (87,  72). (88.  81),  (91.  90).  (91, 

I.    There  are  ties  in  scores  71,  77.  and  91.    We  calculate 
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c,.=  11,  9,  4.  4,  5.  5,  4.  2.  3,  2,  0,  0 


and 


tf,  =  0,  0,  4.  4,   1.   0,   1,  2.   0.   0,   0,   0, 


and     by    using    Equation    4.23    and    4.24    we    find    that    pc 

2x12       c  ._        ..        .  nc 

pd  =  .    From  Equation  4.25. 


2  x  49 
12(11) 


and 


12(11. 


ts  =  (pc  -  pd)  =  (0.7424  -  0.1818)  =  0.5606 


estimates  a  positive  correlation  between  these  variables.  We  already  found  r 
=  0.7378  with  the  same  data.  In  general,  the  absolute  value  of  Spearman's  r 
will  tend  to  be  larger  than  the  absolute  value  of  Kendall's  tau.  As  a  test  of 
significance  there  is  no  strong  reason  to  prefer  one  over  the  other,  because 
both  usually  give  almost  the  same  result.    [Ref.  5:  p. 251] 

D.     CONFIDENCE  INTERVALS  FOR  CORRELATION  COEFFICIENT  WHEN  WE 
USED  KENDALL'S  TAU 

To  find  a  confidence  interval  for  the  population  correlation  coefficient  by 
using  Kendall's  rs,  the  Z  statistic  will  be 


z-H 


1    +Tf 
1    -T. 


=  tanh     r 


vs- 


(4.26) 


which  is  approximately  normally  distributed  with  the  expected  value  given  in 
Equation  4.10  and  variance  given  in  Equation  4.11.  [Ref.  6:    p. 463] 
Again  normalization  on  Z  can  be  accomplished  yielding 


Z- 

1 ,  / 1  +  p  v 

7      rt  ~?  \ 

K3  = 

i-         tz  \L  ) 

n  -  3   ^ 

x                       o{Z) 

(4.27) 


which  is  approximately  normally  distributed  with  expected  value  equal  to  0 
and  variance  equal  to  1. 

Using  the  normal  approximation,  there  is  95%  certainty  that 
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o[Z) 
and  an  95%  confidence  interval  of  E(Z)  will  be 


Z-E(Z) 
-1.96  < <  1.96.  (4.28) 


1       (  1  +  P 
1.96a  <E(Z)  =  4"  In 


2       \    1  -  p 


(4.29) 


and 


1.96a  U<f-^-  ),  (4.30a) 


and 


1+/'   \ \J   1 


1  -  9    }  r      \    2 


I 


<exp-2    ^- In —     +1.96(7     •.  (4.30b) 


If  the  left  side  of  4.30a  is  called  L3  and  the  right  side  of  4.30b  is  called  U3  then 

L3<(-^r-)<U3,  (4.31) 

and  from  this  the  95%  confidence  interval  for  p  will  be 

(4.32) 


Thus  Kendall's  ts  in  place  of  Pearson's  R  can  be  used  to  find  a  confidence 
interval  for  the  population  correlation  coefficient  by  using  the  normal 
approximation  method  explained  above.  Again,  if  Kendall's  ts  is  used  in  place 
of  Pearson's  R.  the  observations  need  to  be  independent.  If  they  are  not 
independent  then  the  normal  approximation  method  to  find  a  sample  size  for 
desired  confidence  interval  cannot  be  used. 
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E.  SAMPLE  SIZE  DETERMINATION  FOR  THE  NONPARAMETRIC  MEASURES 

If  the  nonparametric  measures  of  correlations  are  to  be  used  to  obtain  a 
confidence  interval,  the  bivariate  observations  must  be  independent.  If  they 
are  not  independent  then  the  nonparametric  measures  of  correlations  cannot 
be  used.  If  the  variables  are  not  independent,  then  the  population  correlation 
coefficient  value  will  be  different  than  zero.  If  the  population  correlation  value 
is  different  than  zero,  then  the  standard  normal  approximation  can  not  be 
used.  The  only  knowledge  is  that  if  X  and  Y  come  from  independent  bivariate 
observations,  then  use  of  the  normal  approximation  method  to  find  a 
confidence  interval  for  population  correlation  coefficient  is  valid.  For  this 
purpose.  Pearson's  R  is  used  for  determinating  sample  size  when  using  the 
normal  approximation  method  that  was  explained  in  Chapter  II. 

F.  THE  RELATION  BETWEEN  PEARSON'S  R  SPEARMAN'S  R  AND  KENDALL 
TAU 

If  the  data  are  at  least  interval  scaled  with  independent  observations,  then 
all  three  measures  of  correlation  value  can  be  used  to  find  a  maximum  sample 
size  for  a  desired  confidence  interval  by  using  the  normal  approximation 
method.  To  see  the  difference  between  these  three  method,  let's  use  the  same 
12  sample  data  pair  we  used  in  Chapter  IV,  Section  A.  Previous  computations 
from  the  data  resulted  in  r  =  0.729.  R  =  0.7354  and  ts  =  0.5606.  If  the 
confidence  interval  for  population  correlation  coefficient  is  calculated  by  using 
R  =  0.7374.  the  statistic  will  be 

2       1    ,n(.lZ2Z±)  =  0.9448. 
2       V  0.2626   / 

and  standard  deviation 

(7(2)  =  -t==-  =  0.334. 
Jn  -  3 

The  95  percent  confidence  limits  for  E(Z)  are 
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0.9448  -  1.96  x  0.334  <  E(Z)  <  0.9448  +  1.96  x  0.334, 
which  reduce  to 

0.2902  <E(Z)<  1.5995. 
The  inequalities  can  be  written  as 

1     /  1+M 

0.2902  <  y  In        _  <  1.5995, 

and  from  Equation  2.15a  and  2.15b 

exp[2  x  (0.2902)}  <  [      ^_P    \<  exp{2  x  (1.5995)}. 

Calculating  L,    =    1.7868,  and  U.    =   24.508  and  applying  Equation  2.17,  the 
confidence  interval  for  p  will  be 

1.7868  -  1    \  /   24.508  -  1 


<P< 


1.7868  +  1    /  V  24.508  ^ 

which  reduces  to 

0.2823  <p<  0.9216. 

So.     the     95%     confidence     interval     for     p     by     using     Pearson's     R     is 
0.2823  <  p  <  0.9216.  and  confidence  interval  size  (2A)  is  0.6363. 
If  Spearman  s  r  is  used  with  r  =  0.729,  then 

Z  =  —  \n(  1  729  ^i  =  0.9266 
2       V  0.271   / 

and  g(Z)  is  the  same  as  with  Pearson's  R.  The  95  percent  confidence  limits  for 
E(Z)  are 

0.9266  -  1.96  x  0.334  <  E{Z)  <  0.9266  4-  1.96  x  0.334 
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which  reduce  to 

0.272  <E(Z)<  1.5813. 
The  inequalities  can  be  written  as 


1       /  1  +  P  \ 
0.272  <y  In        _        )< 


and  from  Equation  4.15a  and  4.15b 

exp{2  x  (0.272)}  <  (       *P    )  <  exp{2  x  (1.5813)}. 

Calculating  L2  =  1.7229  and  U2  =  23.632.  the  confidence  interval  for  p  is 

1.7229-  1   \  /   23.632-  1 

<  P< 


1.7229  +  1    J  \   23.632  H 

yielding 

0.2655  <p<  0.9188. 

So,     the     95%     confidence     interval     for    p     by     using     Spearman's     r     is 
0.2655  <  p  <  0.9188,  and  confidence  interval  size  (2A)  is  0.6553. 
Finally  using  Kendall's  tau.  ts  =  0.5606.  gives  the  statistic 

2       V   0.4394   / 
and  o(Z)  =  0.334.   The  95  percent  confidence  limits  for  E(Z)  are  then 
0.6337  -  1.96  x  0.334  <  E(Z)  <  0.6337  +  1.96  x  0.334 

which  reduces  to 

-0.021  <E(Z)<  1.2884. 

The  inequalities  can  be  written  as 
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1      /  1  +  p 
-0.021  <y  In        _       ] 

and  from  Equation  4.30a  and  4.30b 

exp{2  x  (  -0.021)}  <  (       +_P    J  <  exp{2  x  (1.2884)}. 

Again  calculating  L3  =  0.9589  and  U3  =   13.155,  the  confidence  interval  for  p 
will  be 

0.9589-  1   \  /   13.155-  1 


<P< 


0.9589  4-  1    7       f        V   13.155+  1 


or 


-0.021  <p<  0.8587. 

Thus  the  95%  confidence  interval  for  p  using  Kendall's  tau  is 
-  0.021  <  p  <  0.8587  .  and  confidence  interval  size  (2A)  is  0.8797 

As  can  be  seen  from  these  three  results,  Pearson's  R  and  Spearman's  r 
give  approximately  the  same  confidence  interval  size  (2A).  However,  the 
confidence  interval  size  that  was  obtained  from  Kendall's  ts  is  noticeably 
different  from  the  others.  This  seems  to  be  a  disadvantage  for  Kendall's  tau. 
but  Conover  states  that  there  is  no  strong  reason  to  prefer  one  over  another, 
because  they  will  generally  give  roughly  the  same  result.    [Ref.  5:  p. 251] 

Graphs  are  provided  to  show  the  difference  among  sample  sizes  from 
these  three  methods.  At  the  same  time  these  graphs  can  be  used  to  determine 
the  appropriate  sample  size  for  a  desired  confidence  interval.  Figure  4  on 
page  43  shows  the  sample  size  and  the  confidence  interval  for  these  three 
methods. 
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Figure  4.       REQIRED  SAMPLE  SIZE  FOR  A  95%  CONFIDENCE  INTERVAL  SIZE 
BY  USING  DIFFERENT  SAMPLE  CORRELATION  METHODS 

A  table  can  be  developed  to  provide  the  exact  value  for  different  sample 
correlation  coefficient  methods.  Table  9  shows  these  sample  size  values. 
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Table  9. 


REQUIRED 
BY  USING 


SAMPLE  SIZE  FOR 
DIFFERENT  SAMPLE 


A  95%   CONFIDENCE  INTERVAL  SIZE 
CORRELATION  METHODS 


95%  Confidence 
Interval  Size  =  2A 

Pearson's  R  = 

0.7354 

Spearman's  r 
=  0.729 

Kendall  rs  = 
0.5606 

0.05 

1.076 

1.120 

2.392 

0.06 

772 

804 

1.714 

0.07 

581 

605 

1.287 

0.08 

454 

472 

1.002 

0.09 

364 

379 

804 

0.10 

299 

311 

659 

0.11 

250 

260 

550 

0.12 

212 

221 

466 

0.13 

183 

190 

400 

0.14 

159 

165 

347 

0.15 

140 

145 

304 

0.16 

124 

129 

269 

0.17 

111 

115 

239 

0.18 

100 

104 

214 

0.19 

90 

94 

193 

0.20 

82 

85 

175 

0.21 

75 

78 

160 

0.22 

69 

72 

146 

0.23 

64 

66 

134 

0.24 

59 

61 

124 

0.25 

55 

57 

114 

0.26 

51 

53 

106 

0.27 

48 

50 

99 

0.28 

45 

46 

92 

0.29 

42 

44 

86 

0.30 

40 

41 

81 
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This  chapter  explained  Spearman's  and  Kendalls  measures  of  correlation, 
and  the  problems  of  finding  a  sample  size  for  a  desired  confidence  interval 
size  by  using  Spearman's  r  or  Kendall's  ts  The  next  chapter  will  summarize 
this  study  and  give  some  suggestions  for  further  research  and  study. 
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V.     SUMMARY  AND  SUGGESTION  FOR  FURTHER  RESEARCH  AND  STUDY 

In  this  chapter,  a  summary  will  be  given  of  study  of  sample  sizes  for 
desired  confidence  intervals  when  the  classical  sample  correlation  coefficient 
method  (Pearson's  R),  and  the  nonparametric  statistical  sample  correlation 
coefficient  methods  (Spearman's  r  and  Kendall's  ts)  are  used.  Additionally, 
recommendations  will  be  made  for  some  additional  study  into  the  reduction 
of  the  number  of  observations  needed  to  obtain  a  desired  confidence  interval 
for  the  correlation  coefficient. 

A.     SUMMARY 

This  study  described  the  classical  sample  correlation  coefficient  (Pearson 
R)  and  the  nonparametric  statistical  sample  correlation  coefficient  methods 
(Spearman's  r  and  Kendall's  ts)  to  obtain  the  number  of  samples  needed  to 
obtain  a  desired  confidence  interval  size  for  a  correlation  coefficient. 

First,  a  description  was  provided  of  the  Pearson  product-moment 
correlation  coefficient,  the  estimated  population  correlation  coefficient,  and 
confidence  intervals  for  the  population  correlation  coefficient  by  the  using  the 
normal  approximation  method.  In  the  next  chapter,  it  was  shown  how  the 
sample  size  for  estimating  a  correlation  coefficient  using  the  confidence 
interval  could  be  obtained,  and  a  comparison  was  done  of  these  results  for 
different  sample  correlation  coefficient  values.  The  result  had  the  limitation 
that  one  must  guess  at  the  sample  result  before  taking  the  sample,  but  it  was 
still  possible  to  give  general  results  about  the  magnitude  of  needed  sample 
sizes.  In  Chapter  IV,  the  Spearman  and  Kendall  statistical  sample  correlation 
coefficient  methods  were  described.  Analysis  concluded  showing  that  there 
is  no  way  to  find  a  sample  size  by  using  the  Spearman  and  Kendall  statistical 
sample  correlation  coefficient  method  when  rho  is  not  equal  to  zero,  due  to  the 
absence  of  any  information  about  the  cumulative  distribution  function  when  the 
population  correlation  coefficient  is  nonzero.  Similarly,  values  for 
probabilities,    expected    values,    and    variances    could    not    be    determined. 
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However,  most  of  the  time  the  value  of  population  correlation  coefficient  is 
unknown.  If  the  observations  are  independent,  then  a  sample  size  for  a 
desired  confidence  interval  using  nonparametric  measures  of  correlation  can 
to  found.  If  the  observations  are  not  independent  then  the  normal 
approximation  method  can  not  be  used  for  nonparametric  statistics  to  find  a 
needed  sample  size,  and  instead  Pearson's  R  must  be  used  to  find  a  sample 
size. 

To  use  the  normal  approximation  method,  the  decision  maker  must 
estimate  the  measure  of  correlation  value,  and  then  determine  the  desired 
sample  size  for  a  confidence  interval  of  size  (2A).  In  order  to  find  a  sample  for 
the  desired  confidence  interval,  the  sample  correlation  coefficient  must  first 
be  estimated  without  any  data. 

The  results  for  different  sample  correlation  coefficient  values  were 
compared,  and  it  was  observed  that  if  the  sample  correlation  coefficient  value 
gets  bigger  in  absolute  value  then  the  sample  size  gets  smaller,  and  the 
largest  sample  size  that  could  result  will  occur  when  R  equals  zero. 

Computer  programs  were  developed  to  calculate  sample  sizes  for  a 
desired  confidence  interval  for  different  sample  correlation  coefficient  values, 
and  some  tables  and  graphs  giving  the  sample  size  needed  for  different  R 
values  were  generated.  These  tables  and  graphs  can  be  used  by  a  decision 
maker  to  assist  in  determining  the  desired  sample  size. 

B.     SUGGESTIONS  FOR  FURTHER  STUDY 

In  this  study,  95%  confidence  intervals  were  used.  It  would  be  useful  if 
tables  and  graphs  are  developed  for  other  confidence  interval  sizes,  such  as 
90%.  97.5%  and  99%. 

The  discussion  about  nonparametric  statistics  in  this  study  centered  on 
the:  Spearman  and  Kendall  test  statistics.  It  was  not  concluded  that  these 
methods  needed  smaller  sample  sizes  than  the  classical,  Pearson's  method. 
Additional  research  could  be  done  searching  for  appropriate  sample  sizes  for 
other  nonparametric  statistics. 
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It  is  sincerely  hoped  that  the  information  about  sample  size  needed  to 
estimate  correlation  coefficients,  and  the  tables,  graphs  and  computer 
programs  in  this  thesis  be  beneficial  to  decision  makers  in  deciding  the 
sample  size  for  a  desired  confidence  interval,  when  estimating  a  correlation 
coefficient. 
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APPENDIX  A.     TABLE  FOR  CONFIDENCE  BELTS  FOR  THE  CORRELATION 

COEFFICIENT 
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Figure   5.  95%         CONFIDENCE        BELTS        FOR        THE        CORRELATION 

COEFFICIENT:     The    Vertical    axis    of    this    figure    shows    p,     the 
Horizantal  axis  shows  R. 

[Ref.  6:  p.545]. 
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APPENDIX  B.     THE  APL  PROGRAM  "TEZ"  USED  TO  COMPUTE  CONFIDENCE 
INTERVAL  FOR  DESIRED  SAMPLE  CORRELATION  COEFFICIENT  VALUE 

V    TEZ   R 

[I]  R  THIS  PROGRAM   COMPUTES   THE   CONFIDENCE  INTERVAL   WITH 
[2]  p  ESTIMATED   SAMPLE   CORRELATION   COEFFICIENT   VALUE  FOR 
[3]  r  DIFFERENT  SAMPLE  SIZE.    TO   RUN   THE  PROGRAM,    ENTER 
[4]  R  DESIRED   CORRELATION   COEFFICIENT.    IT   TERMINATES   THE 
[5]  R  EXECUTION  WHEN  THE   SAMPLE   SIZE   IS    >    200.  FOR   BIGGER 
[6]  fl  NUMBERS,    THE   VALUE   OF  N   IN   LINE   2  9  MUST  BE   INCREASED 
[7]  n    TO   DESIRED   SAMPLE   SIZE.    IF   CONFIDENCE   LEVEL   DIFFERENT 
[8]  R  THAN   9  5  PERCENT,    THEN   THE  STANDARD   PROBABILITY    VALUE 
[9]  fl  MUST  BE   CHANGED   IN   LINE    15  AND    16.  IT  IS   IMPORTANT 
[10]  fl  TO   NOTE  THAT  N   CAN  NOT  BE   LESS   THAN   4 . 

[II]  Z«-(l*2)x(®((l+i?)*(l-i?))) 

[12]  N+n 

[13]   • SAMPLE   CORRELATION   COEFFICIENTIS    =  ' ,  5  3  *R 
i  i 

i  i 

[16]  L1:SIGMA+1*((N-3)*(1*2)) 

[17]  fl    95    PERCENT   C.I.    FOR  E(Z)   ARE 

[13]  Zl*Z-( l.QQxSIGMA) 

[19]  Z2«-Z+ (1.  96 xSIGJM) 

[20]  A+Z\*2 

[21]  A1+*A 

[22]  L0W+(A1-1)*(.A1  +  1) 

[23]  E«-Z2x2 

[24]  B1+*B 

[25]  UPPER* (B1-1)*(B1+1) 

[26]  A2+UPPER-L0W 

[27]  'FOfl   SAMPLE   SIZE  =     ',     5    0    *tf 

[28]  'CONFIDENCE   INTERVAL   IS  -     ■  ,    4    2    *>LOW , UPPER 
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[14] 
[15] 


[2  9]   'CONFIDENCE   INTERVAL   SIZE    -    2  A    IS    =  ' ,  4  2  *A2 
[30]   '  ' 
[31]  N+N+l 
[32]   +(A?<;200)/L1 
[33]  n 
V 
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APPENDIX  C.     TABLES  FOR  DESIRED  SAMPLE  SIZE  USING  DIFFERENT 
ESTIMATED  SAMPLE  CORRELATION  COEFFICIENT  VALUES  AND  A  95% 

CONFIDENCE  LEVEL 


Table  10.       REQUIRED  SAMPLE  SIZE  FOR  A  95%  CONFIDENCE  INTERVAL  WHEN 
R  =  0.95 


Estimated 

Correlation 

Coefficient 

Value 

Sample 

Size  = 

n 

Lower 

Confidence 

Limits 

Upper 

Confidence 

Limits 

95% 

Confidence 

Interval  Size 

=  2A 

0.95 

57 

0.92 

0.97 

0.05 

0.95 

42 

0.91 

0.97 

0.06 

0.95 

33 

0.90 

0.98 

0.07 

0.95 

28 

0.89 

0.98 

0.08 

0.95 

23 

0.88 

0.98 

0.09 

0.95 

20 

0.88 

0.98 

0.10 

0.95 

18 

0.87 

0.98 

0.11 

0.95 

17 

0.86 

0.98 

0.12 

0.95 

15 

0.85 

0.98 

0.13 

0.95 

14 

0.85 

0.98 

0.14 

0.95 

13 

0.84 

0.99 

0.15 

0.95 

12 

0.83 

0.99 

0.16 

0.95 

11 

0.81 

0.99 

0.17 

0.95 

10 

0.80 

0.99 

0.19 

0.95 

9 

0.77 

0.99 

0.22 

0.95 

8 

0.74 

0.99 

0.25 

0.95 

7 

0.69 

0.99 

0.30 

52 


Table  11.       REQUIRED  SAMPLE  SIZE  FOR  A 
R  =  0.925 


95%  CONFIDENCE  INTERVAL  WHEN 


Estimated 

Correlation 

Coefficient 

Value 

Sample 

Size  = 

n 

Lower 

Confidence 

Limits 

Upper 

Confidence 

Limits 

95% 

Confidence 

Interval  Size 

=  2A 

0.925 

113 

0.90 

0.95 

0.05 

0.925 

83 

0.89 

0.95 

0.06 

0.925 

64 

0.88 

0.95 

0.07 

0.925 

52 

0.87 

0.96 

0.08 

0.925 

43 

0.86 

0.96 

0.09 

0.925 

36 

0.86 

0.96 

0.10 

0.925 

31 

0.85 

0.96 

0.11 

0.925 

28 

0.84 

0.97 

0.12 

0.925 

25 

0.84 

0.97 

0.13 

0.925 

22 

0.83 

0.97 

0.14 

0.925 

20 

0.82 

0.97 

0.15 

0.925 

19 

0.8I 

0.97 

0.16 

0.925 

17 

0.80 

0.97 

0.17 

0.925 

16 

0.79 

0.97 

0.18 

0.925 

15 

0.78 

0.98 

0.19 

0.925 

14 

0.77 

0.98 

0.20 

0.925 

13 

0.76 

0.98 

0.21 

0.925 

12 

0.75 

0.98 

0.23 

0.925 

11 

0.73 

0.98 

0.25 

0.925 

10 

0.71 

0.98 

0.28 

0.925 

9 

0.68 

0.98 

0.30 

53 


Table  12.       REQUIRED  SAMPLE  SIZE  FOR  A 
R  =  0.85 


95%  CONFIDENCE  INTERVAL  WHEN 


Estimated 

Correlation 

Coefficient 

Value 

Sample 

Size  = 

n 

Lower 

Confidence 

Limits 

Upper 

Confidence 

Limits 

95% 

Confidence 

Interval  Size 

=  2A 

0.85 

398 

0.82 

0.87 

0.05 

0.85 

287 

0.82 

0.88 

0.06 

0.85 

217 

0.81 

0.88 

0.07 

0.85 

170 

0.80 

0.89 

0.08 

0.85 

138 

0.79 

0.89 

0.09 

0.85 

114 

0.79 

0.89 

0.10 

0.85 

96 

0.79 

0.90 

0.11 

0.85 

82 

0.78 

0.90 

0.12 

0.85 

71 

0.77 

0.90 

0.13 

0.S5 

63 

0.77 

0.91 

0.14 

0.85 

56 

0.76 

0.91 

0.15 

0.85 

50 

0.75 

0.91 

0.16 

0.85 

45 

0.74 

0.91 

0.17 

0.85 

41 

0.74 

0.92 

0.18 

0.85 

37 

0.73 

0.92 

0.19 

0.85 

34 

0.72 

0.92 

0.20 

0.85 

32 

0.71 

0.92 

0.21 

0.85 

30 

0.71 

0.93 

0.22 

0.85 

28 

0.70 

0.93 

0.23 

0.85 

26 

0.69 

0.93 

0.24 

0.85 

24 

0.68 

0.93 

0.25 

0.85 

23 

0.67 

0.93 

0.26 

0.85 

22 

0.67 

0.94 

0.27 

0.85 

21 

0.66 

0.94 

0.28 

0.85 

20 

0.94 

0.73 

0.29 

0.85 

19 

0.64 

0.94 

0.30 

54 


Table  13.       REQUIRED  SAMPLE  SIZE  FOR  A  95%  CONFIDENCE  INTERVAL  WHEN 
R  =  0.70 


Estimated 

Correlation 

Coefficient 

Value 

Sample 

Size  = 

n 

Lower 

Confidence 

Limits 

Upper 

Confidence 

Limits 

95% 

Confidence 

Interval  Size 

=  2A 

0.70 

1326 

0.67 

0.72 

0.05 

0.70 

745 

0.67 

0.73 

0.06 

0.70 

715 

0.66 

0.73 

0.07 

0.70 

558 

0.66 

0.74 

0.08 

0.70 

448 

0.65 

0.74 

0.09 

0.70 

367 

0.65 

0.75 

0.10 

0.70 

307 

0.64 

0.75 

0.11 

0.70 

260 

0.64 

0.76 

0.12 

0.70 

224 

0.63 

0.76 

0.13 

0.70 

195 

0.62 

0.77 

0.14 

0.70 

171 

0.62 

0.77 

0.15 

0.70 

151 

0.61 

0.77 

0.16 

0.70 

135 

0.60 

0.77 

0.17 

0.70 

121 

0.60 

0.78 

0.18 

0.70 

110 

0.59 

0.78 

0.19 

0.70 

100 

0.59 

0.79 

0.20 

0.70 

91 

0.58 

0.79 

0.21 

0.70 

84 

0.57 

0.79 

0.22 

0.70 

77 

0.57 

0.80 

0.23 

0.70 

71 

0.56 

0.80 

0.24 

0.70 

66 

0.55 

0.81 

0.25 

0.70 

62 

0.55 

0.81 

0.26 

0.70 

57 

0.54 

0.81 

027 

0.70 

54 

0.53 

0.82 

0.28 

0.70 

51 

0.53 

0.82 

0.29 

0.70 

48 

0.52 

0.82 

0.30 
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Table  14.       REQUIRED  SAMPLE  SIZE  FOR  A  95%  CONFIDENCE  INTERVAL  WHEN 
R  =  0.65 


Estimated 

Correlation 

Coefficient 

Value 

Sample 

Size  = 

n 

Lower 

Confidence 

Limits 

Upper 

Confidence 

Limits 

95% 

Confidence 

Interval  Size 

=  2A 

0.65 

1.698 

0.62 

0.67 

0.05 

0.65 

1.217 

0.62 

0.68 

0.06 

0.65 

915 

0.61 

0.68 

0.07 

0.65 

712 

0.61 

0.69 

0.08 

0.65 

572 

0.60 

0.69 

0.09 

0.65 

469 

0.60 

0.70 

0.10 

0.65 

392 

0.59 

0.70 

0.11 

0.65 

332 

0.59 

0.71 

0.12 

0.65 

285 

0.58 

0.71 

0.13 

0.65 

248 

0.57 

0.71 

0.14 

0.65 

217 

0.57 

0.72 

0.15 

0.65 

192 

0.56 

0.72 

0.16 

0.65 

172 

0.56 

0.73 

0.17 

0.65 

154 

0.55 

0.73 

0.18 

0.65 

139 

0.54 

0.73 

0.19 

0.65 

126 

0.54 

0.74 

0.20 

0.65 

115 

0.53 

0.74 

0.21 

0.65 

105 

0.53 

0.75 

0.22 

0.65 

97 

0.52 

0.75 

0.23 

0.65 

90 

0.51 

0.75 

0.24 

0.65 

83 

0.83 

0.51 

0.76 

0.65 

77 

0.50 

0.76 

0.26 

0.65 

72 

0.49 

0.76 

0.27 

0.65 

67 

0.49 

0.77 

0.28 

0.65 

63 

0.48 

0.77 

0.29 

0.65 

59 

0.47 

0.77 

0.30 

56 


Table  15.       REQUIRED  SAMPLE  SIZE  FOR  A 
R  =  0.60 


95%  CONFIDENCE  INTERVAL  WHEN 


Estimated 

Correlation 

Coefficient 

Value 

Sample 

Size  = 

n 

Lower 

Confidence 

Limits 

Upper 

Confidence 

Limits 

95% 
Confidence 
Interval  Size 

=  2A 

0.60 

2.082 

0.57 

0.62 

0.05 

0.60 

1.495 

0.57 

0.63 

0.06 

0.60 

1.123 

0.56 

0.63 

0.07 

0.60 

875 

0.56 

0.64 

0.08 

0.60 

701 

0.55 

0.64 

0.09 

0.60 

575 

0.55 

0.65 

0.10 

0.60 

480 

0.54 

0.65 

0.11 

0.60 

407 

0.54 

0.66 

0.12 

0.60 

349 

0.53 

0.66 

0.13 

0.60 

303 

0.53 

0.67 

0.14 

0.60 

266 

0.52 

0.67 

0.15 

0.60 

235 

0.51 

0.67 

0.16 

0.60 

209 

0.51 

0.68 

0.17 

0.60 

190 

0.50 

0.68 

0.18 

0.60 

169 

0.50 

0.69 

0.19 

0.60 

153 

0.49 

0.69 

0.20 

0.60 

140 

0.48 

0.69 

0.21 

0.60 

128 

0.48 

0.70 

C.22 

0.60 

118 

0.47 

0.70 

0.23 

0.60 

109 

0.47 

0.71 

0.24 

0.60 

100 

0.46 

0.71 

0.25 

0.60 

93 

0.45 

0.71 

0.26 

0.60 

87 

0.45 

0.72 

0.27 

0.60 

81 

0.44 

0.72 

0.28 

0.60 

76 

0.44 

0.73 

0.29 

0.60 

71 

0.43 

0.73 

0.30 

57 


Table  16.       REQUIRED  SAMPLE  SIZE  FOR  A 
R  =  0.55 


95%  CONFIDENCE  INTERVAL  WHEN 


Estimated 

Correlation 

Coefficient 

Value 

Sample 

Size  = 

n 

Lower 

Confidence 

Limits 

Upper 

Confidence 

Limits 

95% 

Confidence 

Interval  Size 

=  2A 

0.55 

2.475 

0.52 

0.57 

0.05 

0.55 

1.773 

0.52 

0.58 

0.06 

0.55 

1.332 

0.51 

0.58 

0.07 

0.55 

1.038 

0.51 

0.59 

0.08 

0.55 

832 

0.50 

0.59 

0.09 

0.55 

681 

0.50 

0.60 

0.10 

0.55 

569 

0.49 

0.60 

0.11 

0.55 

482 

0.48 

0.60 

0.12 

0.55 

413 

0.48 

0.61 

0.13 

0.55 

359 

0.47 

0.61 

0.14 

0.55 

314 

0.47 

0.62 

0.15 

0.55 

278 

0.46 

0.62 

0.16 

0.55 

247 

0.46 

0.63 

0.17 

0.55 

222 

0.45 

0.63 

0.18 

0.55 

200 

0.45 

0.64 

0.19 

0.55 

181 

0.44 

0.64 

0.20 

0.55 

165 

0.44 

0.65 

0.21 

0.55 

151 

0.43 

0.65 

0.22 

0.55 

139 

0.42 

0.65 

0.23 

0.55 

128 

0.42 

0.66 

0.24 

0.55 

118 

0.41 

0.66 

0.25 

0.55 

110 

0.41 

0.67 

0.26 

0.55 

102 

0.40 

0.67 

0.27 

0.55 

95 

0.39 

0.67 

0.28 

0.55 

89 

0.39 

0.68 

0.29 

0.55 

84 

0.38 

0.68 

0.30 

58 


Table  17.       REQUI 
R  =  0 


RED  SAMPLE  SIZE  FOR  A  95%  CONFIDENCE  INTERVAL  WHEN 
50 


Estimated 

Correlation 

Coefficient 

Value 

Sample 

Size  = 

n 

Lower 

Confidence 

Limits 

Upper 

Confidence 

Limits 

95% 

Confidence 

Interval  Size 

=  2A 

0.50 

2860 

0.47 

0.53 

0.05 

0.50 

2049 

0.47 

0.53 

0.06 

0.50 

1542 

0.46 

0.53 

0.07 

0.50 

1199 

0.46 

0.54 

0.08 

0.50 

961 

0.46 

0.55 

0.09 

0.50 

787 

0.45 

0.55 

0.10 

0.50 

657 

0.44 

0.56 

0.11 

0.50 

556 

0.44 

0.56 

0.12 

0.50 

477 

0.43 

0.56 

0.13 

0.50 

414 

0.42 

0.56 

0.14 

0.50 

363 

0.42 

0.57 

0.15 

0.50 

320 

0.41 

0.58 

0.16 

0.50 

285 

0.41 

0.58 

0.17 

0.50 

255 

0.40 

0.58 

0.18 

0.50 

230 

0.40 

0.59 

0.19 

0.50 

209 

0.40 

0.60 

0.20 

0.50 

190 

0.39 

0.60 

0.21 

0.50 

174 

0.38 

0.60 

0.22 

0.50 

159 

0.38 

0.61 

0.23 

0.50 

147 

0.37 

0.61 

0.24 

0.50 

136 

0.36 

0.61 

0.25 

0.50 

126 

0.36 

0.62 

0.26 

0.50 

117 

0.35 

0.62 

0.27 

0.50 

109 

0.34 

0.63 

0.28 

0.50 

102 

0.34 

0.63 

0.29 

0.50 

96 

0.33 

0.63 

0.30 

59 


Table  18.       REQUIRED  SAMPLE  SIZE  FOR  A  95%  CONFIDENCE  INTERVAL  WHEN 
R  =  0.45 


Estimated 

Correlation 

Coefficient 

Value 

Sample 

Size  = 

n 

Lower 

Confidence 

Limits 



Upper 

Confidence 

Limits 

95% 

Confidence 

Interval  Size 

=  2A 

0.45 

3.237 

0.42 

0.47 

0.05 

0.45 

2.316 

0.42 

0.48 

0.06 

0.45 

1.738 

0.41 

0.48 

0.07 

0.45 

1.356 

0.41 

0.49 

0.08 

0.45 

1.086 

0.40 

0.49 

0.09 

0.45 

889 

0.40 

0.50 

0.10 

0.45 

744 

0.39 

0.50 

0.11 

0.45 

628 

0.39 

0.51 

0.12 

0.45 

539 

0.38 

0.51 

0.13 

0.45 

467 

0.37 

0.51 

0.14 

0.45 

409 

0.37 

0.52 

0.15 

0.45 

361 

0.35 

0.42 

0.16 

0.45 

322 

0.36 

0.53 

0.17 

0.45 

288 

0.35 

0.53 

0.18 

0.45 

260 

0.35 

0.54 

0.19 

0.45 

235 

0.34 

0.54 

0.20 

0.45 

214 

0.34 

0.55 

0.21 

0.45 

196 

0.33 

0.55 

0.22 

0.45 

179 

0.33 

0.56 

0.23 

0.45 

165 

0.32 

0.56 

0.24 

0.45 

153 

0.31 

0.56 

0.25 

0.45 

142 

0.31 

0.57 

0.26 

0.45 

132 

0.30 

0.57 

0.27 

0.45 

123 

0.30 

0.58 

0.28 

0.45 

115 

0.29 

0.58 

0.29 

0.45 

108 

0.29 

0.59 

0.30 

60 


Table  19.       REQUIRED  SAMPLE  SIZE  FOR  A 
R  =  0.40 


95%  CONFIDENCE  INTERVAL  WHEN 


Estimated 

Correlation 

Coefficient 

Value 

Sample 

Size  = 

n 

Lower 

Confidence 

Limits 

Upper 

Confidence 

Limits 

95% 

Confidence 

Interval  Size 

=  2A 

0.40 

3.587 

0.37 

0.42 

0.05 

0.40 

2.568 

0.37 

0.43 

0.06 

0.40 

1.931 

0.36 

0.43 

0.07 

0.40 

1.504 

0.36 

0.44 

0.08 

0.40 

1.205 

0.35 

0.44 

0.09 

0.40 

985 

0.35 

0.45 

0.10 

0.40 

822 

0.34 

0.45 

0.11 

0.40 

694 

0.34 

0.46 

0.12 

0.40 

597 

0.33 

0.45 

0.13 

0.40 

518 

0.33 

0.47 

0.14 

0.40 

453 

0.32 

0.47 

0.15 

0.40 

400 

0.32 

0.48 

0.16 

0.40 

356 

0.31 

0.48 

0.17 

0.40 

319 

0.31 

0.49 

0.18 

0.40 

287 

0.30 

0.49 

0.19 

0.40 

260 

0.30 

0.50 

0.20 

0.40 

237 

0.29 

0.50 

0.21 

0.40 

216 

0.28 

0.50 

0.22 

0.40 

198 

0.28 

0.51 

0.23 

0.40 

183 

0.27 

0.51 

0.24 

0.40 

169 

0.27 

0.52 

0.25 

0.40 

157 

0.26 

0.52 

0.26 

0.40 

146 

0.26 

0.53 

0.27 

0.40 

136 

0.25 

0.53 

0.28 

0.40 

127 

0.25 

0.54 

0.29 

0.40 

119 

0.24 

0.54 

0.30 

''1 


Table  20.       REQUI 
R  =  0 


RED  SAMPLE  SIZE  FOR  A  95%  CONFIDENCE  INTERVAL  WHEN 
.35 


Estimated 

Correlation 

Coefficient 

Value 

Sample 

Size  = 

n 

Lower 

Confidence 

Limits 

Upper 

Confidence 

Limits 

95% 
Confidence 
Interval  Size 

=  2A 

0.35 

3.913 

0.32 

0.37 

0.05 

0.35 

2.802 

0.32 

0.33 

0.06 

0.35 

2.107 

0.31 

0.38 

0.07 

0.35 

1.640 

0.31 

0.39 

0.08 

0.35 

1.313 

0.30 

0.39 

0.09 

0.35 

1.075 

0.30 

0.40 

0.10 

0.35 

897 

0.29 

0.40 

0.11 

0.35 

759 

0.29 

0.41 

0.12 

0.35 

651 

0.28 

0.41 

0.13 

0.35 

565 

0.28 

0.42 

0.14 

0.35 

494 

0.27 

0.42 

0.15 

0.35 

436 

0.27 

0.43 

0.16 

0.35 

388 

0.26 

0.43 

0.17 

0.35 

348 

0.26 

0.44 

0.18 

0.35 

313 

0.25 

0.44 

0.19 

0.35 

283 

0.24 

0.44 

0.20 

0.35 

256 

0.24 

0.45 

0.21 

0.35 

235 

0.23 

0.45 

0.22 

0.35 

216 

0.23 

0.46 

0.23 

0.35 

199 

0.22 

0.46 

0.24 

0.35 

184 

0.22 

0.47 

0.25 

0.35 

170 

0.21 

0.47 

0.26 

0.35 

158 

0.21 

0.48 

0.27 

0.35 

148 

0.20 

0.48 

0.28 

0.35 

138 

0.20 

0.49 

0.29 

0.35 

129 

0.19 

0.49 

0.30 

Table  21.       REQUIRED  SAMPLE  SIZE  FOR  A  95%  CONFIDENCE  INTERVAL  WHEN 
R  =  0.30 


Estimated 

Correlation 

Coefficient 

Value 

Sample 

Size  = 

n 

Lower 

Confidence 

Limits 

Upper 

Confidence 

Limits 

95% 

Confidence 

Interval  Size 

=  2A 

0.30 

4.203 

0.27 

0.32 

0.05 

0.30 

3.015 

0.27 

0.33 

0.06 

0.30 

2.263 

0.26 

0.33 

0.07 

0.30 

1.746 

0.26 

0.34 

0.08 

0.30 

1.413 

0.25 

0.34 

0.09 

0.30 

1.156 

0.25 

0.35 

0.10 

0.30 

965 

0.24 

0.35 

0.11 

0.30 

816 

0.24 

0.36 

0.12 

0.30 

700 

0.23 

0.36 

0.13 

0.30 

607 

0.23 

0.37 

0.14 

0.30 

531 

0.22 

0.37 

0.15 

0.30 

469 

0.22 

0.38 

0.16 

0.30 

417 

0.21 

0.38 

0.17 

0.30 

373 

0.21 

0.39 

0.18 

0.30 

335 

0.20 

0.39 

0.19 

0.30 

304 

0.20 

0.40 

0.20 

0.30 

277 

0.19 

0.40 

0.21 

0.30 

253 

0.19 

0.41 

0.22 

0.30 

232 

0.18 

0.41 

0.23 

0.30 

214 

0.18 

0.42 

0.24 

0.30 

197 

0.17 

0.42 

0.25 

0.30 

183 

0.16 

0.42 

0.26 

0.30 

170 

0.16 

0.43 

0.27 

0.30 

158 

0.15 

0.43 

0.28 

0.30 

148 

0.15 

0.44 

0.29 

0.30 

138 

0.14 

0.44 

0.30 
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Table  22.       REQUIRED  SAMPLE  SIZE  FOR  A  95%  CONFIDENCE  INTERVAL  WHEN 
R  =  0.25 


Estimated 

Correlation 

Coefficient 

Value 

Sample 

Size  = 

n 

Lower 

Confidence 

Limits 

Upper 

Confidence 

Limits 

95% 

Confidence 

Interval  Size 

=  2A 

0.25 

4.466 

0.22 

0.27 

0.05 

0.25 

3.198 

0.22 

0.28 

0.06 

0.25 

2.402 

0.21 

0.29 

0.07 

0.25 

1.871 

0.21 

0.29 

0.08 

0.25 

1.498 

0.20 

0.29 

0.09 

0.25 

1.226 

0.20 

0.30 

0.10 

0.25 

1,023 

0.19 

0.30 

0.11 

0.25 

866 

0.19 

0.31 

0.12 

0.25 

742 

0.18 

0.31 

0.13 

0.25 

644 

0.13 

0.32 

0.14 

0.25 

564 

0.17 

0.32 

0.15 

0.25 

497 

0.17 

0.33 

0.16 

0.25 

442 

0.16 

0.33 

0.17 

0.25 

396 

0.16 

0.34 

0.18 

0.25 

357 

0.15 

0.34 

0.19 

0.25 

323 

0.15 

0.35 

0.20 

0.25 

294 

0.14 

0.35 

0.21 

0.25 

268 

0.14 

0.36 

0.22 

0.25 

246 

0.13 

0.36 

0.23 

0.25 

226 

0.13 

0.37 

0.24 

0.25 

209 

0.12 

0.37 

0.25 

0.25 

194 

0.12 

0.38 

0.26 

0.25 

181 

0.11 

0.38 

0.27 

0.25 

168 

0.11 

0.39 

0.28 

0.25 

157 

0.10 

0.39 

0.29 

0.25 

147 

0.09 

0.39 

0.30 
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Table  23.       REQUIRED  SAMPLE  SIZE  FOR  A 
R  =  0.20 


95%  CONFIDENCE  INTERVAL  WHEN 


Estimated 

Correlation 

Coefficient 

Value 

Sample 

Size  = 

n 

Lower 

Confidence 

Limits 

Upper 

Confidence 

Limits 

95% 
Confidence 
Interval  Size 

=  2A 

0.20 

4.784 

0.17 

0.22 

0.05 

0.20 

3.615 

0.17 

0.23 

0.06 

0.20 

2.522 

0.16 

0.23 

0.07 

0.20 

1.963 

0.16 

0.24 

0.08 

0.20 

1.574 

0.15 

0.24 

0.09 

0.20 

1.288 

0.15 

0.25 

0.10 

0.20 

1.076 

0.14 

0.25 

0.11 

0.20 

908 

0.14 

0.26 

0.12 

0.20 

778 

0.13 

0.26 

0.13 

0.20 

675 

0.13 

0.27 

0.14 

0.20 

591 

0.12 

0.27 

0.15 

0.20 

521 

0.12 

0.28 

0.16 

0.20 

464 

0.11 

0.28 

0.17 

0.20 

415 

0.11 

0.29 

0.18 

0.20 

374 

0.10 

0.29 

0.19 

0.20 

338 

0.10 

0.30 

0.20 

0.20 

308 

0.09 

0.30 

0.21 

0.20 

281 

0.09 

0.31 

0.22 

0.20 

258 

0.08 

0.31 

0.23 

0.20 

237 

0.08 

0.32 

0.24 

0.20 

219 

0.07 

0.32 

0.25 

0.20 

203 

0.07 

0.33 

0.26 

0.20 

189 

0.16 

0.33 

0.27 

0.20 

176 

0.06 

0.34 

0.28 

0.20 

164 

0.15 

0.34 

0.29 

0.20 

153 

0.05 

0.35 

0.30 
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Table  24.       REQUIRED  SAMPLE  SIZE  FOR  A 
R  =  0.15 


95%  CONFIDENCE  INTERVAL  WHEN 


Estimated 

Correlation 

Coefficient 

Value 

Sample 

Size  = 

n 

Lower 

Confidence 

Limits 

Upper 

Confidence 

Limits 

95  /o 

Confidence 

Interval  Size 

=  2A 

0.15 

4.855 

0.12 

0.17 

0.05 

0.15 

3.476 

0.12 

0.18 

0.06 

0.15 

2.610 

0.11 

0.18 

0.07 

0.15 

2.032 

0.11 

0.19 

0.08 

0.15 

1.631 

0.10 

0.19 

0.09 

0.15 

1.333 

0.10 

0.20 

0.10 

0.15 

1.111 

0.09 

0.20 

0.11 

0.15 

946 

0.09 

0.21 

0.12 

0.15 

807 

0.08 

0.21 

0.13 

0.15 

699 

0.08 

0.22 

0.14 

0.15 

612 

0.07 

0.22 

0.15 

0.15 

540 

0.07 

0.23 

0.16 

0.15 

481 

0.06 

0.23 

0.17 

0.15 

430 

0.06 

0.24 

0.18 

0.15 

387 

0.05 

0.24 

0.19 

0.15 

350 

0.05 

0.25 

0.20 

0.15 

319 

0.04 

0.25 

0.21 

0.15 

291 

0.04 

0.26 

0.22 

0.15 

267 

0.03 

0.26 

0.23 

0.15 

246 

0.03 

0.27 

0.24 

0.15 

227 

0.02 

0.27 

0.25 

0.15 

210 

0.02 

0.28 

0.26 

0.15 

195 

0.01 

0.28 

0.27 

0.15 

182 

0.01 

0.29 

0.28 

0.15 

170 

0.00 

0.29 

0.29 

0.15 

159 

0.00 

0.30 

0.30 

66 


Table  25.       REQUIRED  SAMPLE  SIZE  FOR  A 
R  =  0.05 


95%  CONFIDENCE  INTERVAL  WHEN 


Estimated 

Correlation 

Coefficient 

Value 

Sample 

Size  = 

n 

Lower 

Confidence 

Limits 

Upper 

Confidence 

Limits 

95% 
Confidence 
Interval  Size 

=  2A 

0.05 

5,072 

0.02 

0.07 

0.05 

0.05 

3,627 

0.02 

0.08 

0.06 

0.05 

2,719 

0.01 

0.08 

0.07 

0.05 

2.122 

0.01 

0.09 

0.08 

0.05 

1.696 

0.00 

0.09 

0.09 

0.05 

1.388 

0.00 

0.10 

0.10 

0.05 

1.157 

-0.01 

0.10 

0.11 

0.05 

979 

-0.01 

0.11 

0.12 

0.05 

840 

-0.02 

0.11 

0.13 

0.05 

728 

-0.02 

0.12 

0.14 

0.05 

638 

-0.03 

0.12 

0.15 

0.05 

563 

-0.03 

0.13 

0.16 

0.05 

500 

-0.04 

0.13 

0.17 

0.05 

448 

-0.04 

0.14 

0.18 

0.05 

403 

-0.05 

0.14 

0.19 

0.05 

365 

-0.05 

0.15 

0.20 

0.05 

332 

-0.06 

0.15 

0.21 

0.05 

303 

-0.06 

0.16 

0.22 

0.05 

278 

-0.07 

0.16 

0.23 

0.05 

256 

-0.07 

0.17 

0.24 

0.05 

236 

-0.08 

0.17 

0.25 

0.05 

219 

-0.08 

0.18 

0.26 

0.05 

203 

-0.09 

0.18 

0.27 

0.05 

189 

-0.09 

0.19 

0.28 

0.05 

177 

-0.10 

0.19 

0.29 

0.05 

165 

-0.10 

0.20 

0.30 
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APPENDIX  D.     GRAPHS  THAT  CAN  BE  USED  TO  DETERMINE  SAMPLE  SIZES 
TO  ESTIMATE  CORRELATION  COEFFICIENT  VALUES 
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Figure  6.       REQUIRED  SAMPLE  SIZE  FOR  A  95%  CONFIDENCE  INTERVAL  WHEN 
R  =  0.925  AND  R  =  0.85 
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Figure  7.       REQUIRED  SAMPLE  SIZE  FOR  A  95%  CONFIDENCE  INTERVAL  WHEN 
R  =  0.75  AND  R  =  0.60 
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Figure  8.       REQUIRED  SAMPLE  SIZE  FOR  A  95%  CONFIDENCE  INTERVAL  WHEN 
R  =  0.70  AND  R  =  0.50 
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Figure  9.       REQUIRED  SAMPLE  SIZE  FOR  A  95%  CONFIDENCE  INTERVAL  WHEN 
R  =  0.40  AND  R  =  0.20 
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Figure  10.        REQUIRED    SAMPLE    SIZE    FOR   A   95%    CONFIDENCE    INTERVAL 
WHEN  R  =  0.30  AND  R  =  0.10 
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Figure  11.        REQUIRED    SAMPLE   SIZE    FOR   A   95%    CONFIDENCE    INTERVAL 
WHEN  R  =  0.25  AND  R  =  0.05 
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Figure  12.        REQUIRED    SAMPLE    SIZE    FOR   A   95%    CONFIDENCE    INTERVAL 
WHEN  R  =  0.15  AND  R  =  0.00 
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